Slashdot Mirror


Google To Create "Blog" Search; Potentially Remove From Main

Skyshadow writes "Google, search engine of choice for pretty much everyone, has announced that it will begin a seperate index for blogs and remove them from the normal index, handling them instead in much the same way as their usenet archives. This will hopefully put an end to the recent difficulties locating primary source material among the mountains of blogs which are clogging the ratings system." There's been comments from elsewhere that says they won't be removing them - but that remains to be seen.

37 of 304 comments (clear)

  1. journals by asv108 · · Score: 5, Interesting
    Will /. journals be included in this?

    Is there any chance of having an RSS feature for journals, for everyone or even just subscribers?

    1. Re:journals by jawtheshark · · Score: 5, Interesting
      --
      Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
    2. Re:journals by Cyberdyne · · Score: 5, Informative
      Oh, wait.... It says "User-agent: Mediapartners-Google*" can scan everything. This surprises me however. Still, that's not "GoogleBot", which I see from time to time in my apache logs.

      Anybody got an idea what "Mediapartners-Google*" exactly is?

      Mediapartners-Google would appear to be Google's ad engine - it tries to determine "relevant" ads for the page by spidering it beforehand. Presumably, you would only see hits from that bot if you serve Google text-ads; GoogleBot is the crawler which drives the actual search engine.

      (Aside: Those text ads were quite tricky to filter out - not being images, there's no 'block images' option! Putting "127.0.0.1 pagead.googlesyndication.com" in /etc/hosts did the trick, though...)

    3. Re:journals by cygnusx · · Score: 5, Interesting

      > Those text ads were quite tricky to filter out

      You're entitled to block them if you wish, of course, but if the ads don't consume too many bits, and bring the site-owner some moolah, and don't interfere with your browsing, how does blocking text ads help?

      Knee-jerk ad-blocking will only kill free content on the net, imho.

  2. blogs.google.com? by fewnorms · · Score: 5, Insightful

    Thing is, some of these blogs actually contain some pretty handy info from time to time, as blogs are becoming more and more used as a cheap and easy alternative to a content management system imho ....

    --
    Veni, Vidi, Velcro!
    1. Re:blogs.google.com? by GT_Alias · · Score: 5, Interesting
      Which is why Google is not eliminating them entirely, just moving them over to their own search.

      It's a reasonable solution, I think. Is it worth tainting the vast majority of the search results with useless blog entries just so that the (very) few blogs with good information will still show up?

      This solves their problem with bloggers manipulating search results, yet still keeps the information available to those who want it. Granted, you have to know to look for it, but it seems to me like a fair trade-off.

    2. Re:blogs.google.com? by simong_oz · · Score: 4, Funny

      some of these blogs actually contain some pretty handy info from time to time [my emphasis]

      yeh, that's true, but let's face it - the vast majority are complete and utter drivel and manage to make a cereal packet look like an interesting read.

      --
      "Because it's there." - George Mallory, when asked why he wanted to climb Mt Everest, March 18, 1923 (New York Times)
    3. Re:blogs.google.com? by poot_rootbeer · · Score: 4, Funny

      let's face it - the vast majority are complete and utter drivel and manage to make a cereal packet look like an interesting read.

      But Slashdot is a weblog... oooh, I see.

  3. 'Bout time by Surak · · Score: 4, Interesting

    I, for one, am sick of searching material only to find that the page is some asshat's blog. Nothing against blogs, but you never know where this material came from.

    OTOH, what constitutes a 'blog'? Is Slashdot a blog? Is this a blog? The lines are constantly being blurred, and I'm not sure it'll be easy for google to make that distinction.

    1. Re:'Bout time by Qzukk · · Score: 4, Insightful

      Probably the distinction they will make will be between publicly-available blogging space (livejournal,deadjournal,pitas, and so on) and a personal website that is or contains a blog. This would be the easiest way, since it comes down to setting aside a few hostnames for the new search engine to crawl.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    2. Re:'Bout time by arvindn · · Score: 5, Insightful
      what constitutes a 'blog'?

      I was wondering about that too. Its not black and white, of course, especially when you want to automate it. I can think of several indications that a page is a blog, some weighted linear combination of these factors should work well enough in practice if you spend some time tweaking the weights:

      • Updated frequently
      • Keywords like "blog", "weblog", "posted by", "comments", "permanent link", and so on.
      • Got dates all over the place
      • Is hosted on one of the popular blogging sites (blogspot, lj, /. journals...)
      • Links to and is linked from other weblogs.
      This last factor is important. If you start from a rough heuristic and execute an iterative algorithm, similar to how they calculate pagerank, your blog detection algorithm will get better.
    3. Re:'Bout time by EinarH · · Score: 4, Interesting
      The other day I searched Google for some radio stuff. (helping my father find some equipment).

      Then I noticed that Radio Userland appeared very high on Google. In fact, when you search for "radio"* they get a #5 at Google. As far as i know they only existed for a year. And their popularity, as it appears on google, looks very inflated because of extremly many links in blogs.

      Checked out Daypop.com, which ranks articles/links based on the number of links in blogs. This is what I got:
      Searching All Weblogs for link:radio.userland.com... Found 3260 pages matching query.

      Thats insane. When so many blogs links to the same page their ranking on google gets very high based only on blog-popularity.


      *Searching for only radio is obvious a bad idea as google returns some 40 m. hits.

      --

      Melius mori in libertate quam vivere in servitute.

    4. Re:'Bout time by Anonymous Coward · · Score: 5, Insightful

      I, for one, am happy of searching material only to find that the page is some asshat's blog...

      because what is important, in my point of view, is to GET THE ANSWER to what I'm looking for.

      And if the answer is in a weblog that belongs to "Linux-freaks.Adhzerbahidjan", it still is the answer I'm looking for...

      I mean things like "Proftpd doesn't seem to accept fxp connections", why the hell is this part of my distro not working as I wish...can only be proposed by people having the same problem and discussing it in a blog.

      Another reason I prefer Weblogs to, say, IRC is that I don't have to humiliate myself asking "basic" questions to the 15 year old Guru that is nicknamed "EvilRootBeer" , I just have to parse a few blogs and get my answer without ANY fine manual to read.

      "Nothing against blogs, but you never know where this material came from." Because you KNOW where the news from CNN is coming from ? I mean, they show proof and research material everytime they air a show, or a major groundbreaking news ("Mass destruction weapons found in Irak","Terrorist Bretzel Fails Coup d'Etat"..."

      at least with blogs and the net, you can try and cross check the data, whereas with tv, you usualy only gulp some more mountain dew.

      I just wish you had to find you Linux docs using the manuals provided on the distro and absolutly no other acees to raw data...

    5. Re:'Bout time by mcmonkey · · Score: 5, Funny
      • Got dates all over the place

      Well, that rules out /. Anyone who spends a lot of time here certainly doesn't get dates all over the place.

  4. Yes! But will there be a metasearch? by DeHar · · Score: 5, Insightful

    This is a great idea, especially since many issues have much more commentary than source content. I love the quote "But what happens when the weblog fad dies down?"

    However, I hope they maintain links between the main search and the blog search. Finding primary sources, then a button linking to all blog comments on theis topic would be a great research tool.

  5. Good to weed out.... by caffeinex36 · · Score: 5, Interesting

    Most of the useless information people put into blogs. Although, when you search for information, would you want to search 2 different locations? This is the whole claim to googles fame. I have found that many times people post how-to's in thier blogs along with other information.


    If it ain't broke...don't fix it

    -Rob

    1. Re:Good to weed out.... by Zathrus · · Score: 5, Interesting

      If you know how to do serious web searches via Google then you're already searching at least 2 locations - the main Google search and the Google Groups search. You may also search Google News separately (although the info from there is usually in the main search as well).

      I'm looking forward to this, since most of the stuff Google hits in blogs is completely and utterly irrelevant to what I'm actually trying to find. Google will probably just have another tab to click on, or perhaps a few top links to blog-specific searches if they think it's relevant (like they do with cross links to Google News searches currently). Perhaps even a configurable "Include Blogs" on the preferences page. Whatever, I don't care, just let me exclude the damn things.

      If I don't get what I'm looking for in regular search then may go search Blogs as well. After newsgroups.

  6. Personally.. by xchino · · Score: 5, Insightful

    I've found some of the best information on blogs. I have no problem with them making a blog specific search, but like the Linux specific search I hope relevant sites can still be found from the main search. It would be a pain to have to search every individual google engine for one bit of info. As it is now, I can use the main search and be pretty sure that I'm going to get a relevant result regardless of what category the site falls under. If I'm looking up what IIRC stands for, I don't really care if I get the info from a JoeBlow's blog or from howstuffworks.com.

    --
    Everyone is entitled to their own opinion. It's just that yours is stupid.
  7. Yes! by acehole · · Score: 5, Funny

    Now I can find and read about people's mundane activities more efficiently.

    e.g.

    9:30am :- ate some toast

    9:40am :- went to the toilet

    10:50am :- left the toilet to check the number of hits on my blog

    11:45am :- got a phone call, was wrong number. Might get a real call one day.

    and so on and so forth..

    --
    Be you Admins? nay, we are but lusers!
    1. Re:Yes! by Timesprout · · Score: 4, Funny

      By your toilet reference I can see you have obviously mistaken a stream of diarrhea for someones stream of consiousness .... easy mistake to make with many of the blogs out there.

      --
      Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
      What truth?
      There is no dupe
  8. Will they at least link to the new search? by gleffler · · Score: 4, Insightful

    I just hope that Google does at least say "Hey, you might be able to find what you're looking for on our blog search" at the end or something - like they do now with Google Answers. I do applaud their effort to make their database even more relevant though, and is yet another reason I have to admit to being a shameless whore for Google.

  9. An end to 'Googlewashing'? by Snowhare · · Score: 5, Interesting

    I wonder if this is also intended to stop Googlewashing? Google has a history of trying to 'play fair' - and the power of a few well connected blogs to basically 'take possession' of any term works against that philosophy.

  10. blogs by Blocked+By+Sand · · Score: 5, Interesting

    One of the biggest newspapers in Norway, where I live, has recently said they believe blogs to be the new 'killer app' for delivering information on the net. The problem with that is that the treshold for publishing 'news' is so low, anybody can do it. This makes it very difficult for people to find the info they are looking for. At the same time there is no guaranty the info is useful or even correct. A good reputation will be more and more important for businesses and sites on the net.

    This move by google tells me newspapers in norway aren't the only ones seeing how influental blogs will/could become.This is a truly great step forward if Google could come up with a way of rating the different blogs. That way you could easily find serious tech-blogs.

    Wonder what rating /. would get though ;)

    --
    Be like the twenty-second elephant with heated value in space-Bark!
  11. is ./ a blog? ebay? by loomis · · Score: 5, Interesting

    As a previous poster briefly mentioned, what exactly is a blog? Would Slashdot forums be considered a blog? What about the myriad of ezboard message board forums out there, as well as other discussion websites? If the answer is no, it would be seemingly difficult and perhaps only of minor benefit to seperate just the true "blog" sites while ignoring the other sites.

    And what about ebay? Quite often I am searching for info on an old piece of electronics I've picked up someplace, and I do a goole search, hoping to find information about the item. Well, all I get in return are ebay links to a similar item that was sold on ebay a few months ago. And even then, I click on the link, hoping to see what the item sold for (and thus get an appraisal), but the auction has been removed from the database due to it being several months old. Why index ebay pages? It's really frustrating.

    Loomis

    --
    "The television is the retina of the mind's eye" - Videodrome
  12. The Register is... a bit off by friedegg · · Score: 5, Informative
    "GoogleGuy" (a real Google employee) commented on this on WebmasterWorld saying:
    I think Andrew Orlowski is taking a comment and taking it in the direction that he wants to go. I would take that article with a grain of salt.

    GoogleGuy, going for understatement. :)
    --
    Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
  13. I'd rather they do this for mailing list archives by Thoguth · · Score: 4, Interesting

    I really don't mind finding blog links when I search for something, as they usually at least link to some relevant sources.

    On the other hand, it is really a pain to search for help on something, and instead of getting a useful, authoritative document, I'll get a half-dozen archived unanswered mailing list posts from people with the same problem. I would much rather Google address this dilution from mailing lists.

    --
    The requested URL /iframe/sig.html was not found on this server.
  14. Bad Idea by rwiedower · · Score: 4, Interesting

    I work at a company that has a blog-like recap of political news of interest for our clients and friends. If google tries to separate all sites with blog-like content, won't this naturally reduce my rank without actually increasing the source of information? Or am I missing something? How is google going to search for blog-like sites?

  15. Blur by limekiller4 · · Score: 4, Insightful

    Why not just create a "-source" flag or, as has been suggested, "-noblog"? Why are blogs being marginalized as any less authoritative than other hits? Why is using "-" (eg: ["trading cards" -hockey]) utilized for weeding out certain criteria but not employed here when the goal is the same? Could we at least have a flag for combining the two results?

    A comparison is being made between blogs and the newsgroups which are worlds apart in a number of different ways not the least of which is the thread-nature of the groups.

    What defines a blog, anyway? What defines a not-blog? Is CNN.com a blog? Is it not a blog because many people write for it, because of the number of hits it gets or because it has press credentials? Which category does indymedia.org fit into?

    Will I only get news results when I search for "ferret care?"

    What if the source IS a blog? If the subject IS the blog, will a news site reporting on the blog wind up in the main search results while the subject itself -- the blog -- be only in the blog search?

    --
    My .02,
    Limekiller
  16. Blogs removed from google = FUD by lysurgon · · Score: 5, Insightful

    There's no indication whether or not blogs will be left in or out of search results. This is very different from USENET, which was never part of the web in the first place. Orlowski is far from an unbiased source on this, having published many articles critical of bloggers in general. While two source are cited which are critical of the effect that blogs have had on the google ranking algorythm, none are cited which show the contributions personal publishers have made to the info-sphere.

    Far more authoratative sources that I have already weighed in on this.

    While there's certainly a lot of innane content available in blog form, this isn't really any different than it was before. I have never had to wade through 500 pages of results to find an original source either. The whole thing reeks of FUD to me Methinks that Orlowski and Roddy have their own axes to grind.

    1. Re:Blogs removed from google = FUD by lysurgon · · Score: 4, Informative
      (replying to own post)

      So here's what should be the final word:


      If Google didn't find that blogs improved the results (and I don't know, I would assume they test these things, like, constantly), do you suppose they'd increase the frequency at which they crawl them, or decrease it? Yes, that's what I think.


      From evhead
  17. Re:Ev from Blogger by Anonymous Coward · · Score: 4, Informative
    Right, go to this entry at evhead, and view the source, you'll see:

    <span title="you know, in order to spread more 'Google censors Evhead' suspicions"></snip></span>
    <!-- Andrew Orlowski strikes with another brilliant theory designed to get attention from bloggers (even though the number of their readers is of course "statistically insignificant"). Well shit, I'm biting.

    Based on Eric Schmidt's mentioning of a blog search, Orlowski suggests that Google will remove blogs from the main index.

    This shouldn't surprise many people, but as far as I know, Orlowski is full of crap. Again. If Google didn't find that blogs improved the results (and I don't know, I would assume they test these things, like, constantly), do you suppose they'd increase the frequency at which they crawl them, or decrease it? Yes, that's what I think.

    Too bad my headline isn't any truer than the Register's.-->

  18. They should separate mailing list archives first by Anonymous Coward · · Score: 5, Insightful

    If they really want to make their search engine useful, they ought to separate out Web archives of mailing list discussions. Blogs usually link back to where they got the story, so with only a little digging, you can find the original material. Mailing list discussions, though, are often out of date, irrelevant, and lacking in easy-to-follow references. They annoy me much more when I'm looking for things on the Web.

  19. Re:/. is a blog, no? by RobotRunAmok · · Score: 4, Interesting

    /. is a blog, no?

    No. SlashDot aggregates news stories. It's the Web generation of what the BBS guys had in CompuServe Forums and GEnie Roundtables. The staff is paid to aggregate and thread stories that are of interest to a particular community. (Sometimes they aggregate the really, really good ones more than once.) Technically, SlashDot staff don't submit the stories, members of the community do. Bottom line: it's a professional operation. (g'head, g'head, make the jokes, it's Monday, get 'em outta yer system...)

    Personally, I would use the litmus test of "professionalism" when doping out what is a blog versus what is "legitimate" content. If the "blogger" makes his living as a writer or journalist, then the blog is "supplemental online material." If the site is, as we referred to the vanity publishing phenomenon back in the early '90's, someone's "homepage," but with the added baggage of semi-regular diary entries, then it's a Blog.

    Use of "blogging software" doesn't make someone a writer, or a journalist, and it certainly doesn't automatically grant its user something worth saying, or even something factual to say.

    It's great to see Google realizing this and clamping down.

  20. Bloogle or Bloggle? by Bonewalker · · Score: 5, Funny

    I can just see the red, blue, yellow, and green logo...BLOOGLE. Will the new term for searching blogs explicitly be "Bloggling", or will it be "Bloogling"?

  21. Bullshit. Please read. by sethadam1 · · Score: 4, Insightful

    Slashdor IS a blog. Because we're not talking about some Google employee sitting around and making a judgement call on every link on the net, it's obviously going to be automated by robots.

    Slashdot, like other blogs, pollutes search engine searches with their "permalinks," which, although they might be useful, certainly constitute a blog. In fact, one of the problems with blogs and search engines is that they generate thousands of clickable hyperlinks effortlessly. It's great for someone reading a blog and trying to bookmark a certain section - it's terrible for the guy who wants information on combatting spam through more effective use of his SMTP server and has to search through 30 pages of /. and K5 chatter to find some substance.

    Certainly, Google's criteria for what defines a blog might be helpful, but it seems to me like you're subjectively deciding which blogs are legitimate news sources and which are "some kid rambling on." Say whatever you like about the legitimacy of /., but make no mistake about it, it's a blog.

  22. The real story: Orlowski (successfully) trolls /. by marmoset · · Score: 5, Informative

    Oy. If Slashdot had managed to perform even a minimum amount of editorial diligence (which, pot, here's kettle, is what the Register rails on bloggers for not doing), they'd have found pretty quickly that this article is yet another installment in Andrew Orlowski's (an up-and-coming Dvorak-wannabe) ongoing jihad against weblogs. Don't believe the hype.

  23. Not Quite by emmastory · · Score: 5, Informative

    Google hasn't announced any such thing, at least as far as removing weblog content from the main search is concerned. If you read the article, you'll note that it's Orlowski speculating about a Slashdot comment, of all things - specifically, a comment from the William Gibson blog thread. evhead posted about this Register article on Friday.