Slashdot Mirror


Google Revises Usenet Search

michaelmalak writes "Wednesday night, Google Groups announced in a thread the rollout of their revised 20-year Usenet archive search engine. Among the various 'improvements': ability to search by date has been eliminated, as has the ability to deep link to a single post. See the announcement thread for others' reaction." An anonymous reader writes "ZDNet has published some interesting insights into what makes Google tick. In this lengthy article, Google's vice-president of engineering, Urs Hölzle delves into the nuts and bolts behind Google's operations, what back-up mechanisms and hardware setup is in place and even some interesting homegrown technology like the Google File System (GFS)."

33 of 628 comments (clear)

  1. Progress? by danielrm26 · · Score: 5, Funny

    Among the various 'improvements': ability to search by date has been eliminated, as has the ability to deep link to a single post.

    Well damn - I hope they don't "improve" it too much more.

    --
    dmiessler.com -- grep understanding knowledge
    1. Re:Progress? by stupidfoo · · Score: 4, Funny

      Good point, why would I ever want to download "things" at a fairly constant 2.6-2.8Mbps (on my 3 Mb connection) from newsgroups when I can do so at a very inconsitent 50-100Kbps from a torrent?

      And from torrents I get the added benefit of not only downloading the file, but uploading to everyone else and broadcasting my IP address all over the place.

    2. Re:Progress? by Mondoz · · Score: 4, Interesting

      http://groups-beta.google.com/support/bin/request. py

      This is the feature request/bug reporting form.
      They claim to read every mail generated by this link.

      I just submitted a question about this.
      I wonder what they'd do if the full power of the /. was brought to bear upon this subject...

      --
      /sig
    3. Re:Progress? by chrish · · Score: 4, Interesting

      I used this to bitch about the non-standard HTML coming out of their site, and an actual human responded a few days after an auto-responder did.

      Of course, their HTML still doesn't validate...

      --
      - chrish
    4. Re:Progress? by forrestt · · Score: 5, Interesting

      Well, there might be a more practical reason than they simply don't care about standard HTML. It appears the main problem is they don't tell the doctype. That would take them an extra 118 bytes PER REQUEST to include the type. That means, according to the 1000 requests per second mentioned in the article, they are saving 115Kbps in transfer rates by not including the doctype. It doesn't seem like much, but it is the same thing that got airlines to stop serving food. And this is just the Doctype. I'm sure they cut bites out wherever they can.

    5. Re:Progress? by NormalVisual · · Score: 4, Interesting

      I can see the practical side of this, but this is effectively saying "we're too cheap to implement the standard properly, so we're going to play fast and loose with it to save some money". Microsoft gets hammered for this kind of stuff (rightly, IMHO), so why not Google?

      --
      Please stand clear of the doors, por favor mantenganse alejado de las puertas
    6. Re:Progress? by Eil · · Score: 4, Informative


      Alright people, you can stop overreacting. They just rearranged some things, that's all.

      There's a link at the top of the thread to turn on the left-hand tree frame.

      Deep-linking to a single post is still very much possible.

      And I highly doubt that a search-by-date feature is going to go missing for long in a 20-year archive. This is, after all, a BETA.

      As per usual, Slashdot editors didn't even think it worth their time to follow a single link to see if the submitter wasn't trolling.

  2. hmmmm by meatspray · · Score: 5, Funny

    "Spelling: Google wrote its own spell checker, and maintains that nobody know as many spelling errors as it does. The amount of computing power available at the company means it can afford to begin teaching the system which words are related -- for instance "Imperial", "College" and "London". It's a job that many CPU years, and which would not have been possible without these thousands of machines. "When you have tons of data and tons of computation you can make things work that don't work on smaller systems," said Hölzle. One goal of the company now is to develop a better conceptual understanding of text, to get from the text string to a concept. "

    Next up: Grammar and Content

  3. 500 error? by Saint+Aardvark · · Score: 4, Funny
    Oh my god, we Slashdotted Google!

    (Gathers canned goods, candles, heads for cave)

  4. Dumb by JavaLord · · Score: 5, Insightful

    Why would you remove the search by date function? That is insanely useful when you are looking for posts about a particular product, especially tech products where you might only want the most recent posts, or you might be searching for an oudated product.

  5. Improvements??? by Iphtashu+Fitz · · Score: 4, Interesting

    Among the various 'improvements': ability to search by date has been eliminated, as has the ability to deep link to a single post.

    Jee, nice "improvements"... I personally have linked to individual posts on a web page summarizing a lawsuit I was involved in that was directly related to posts in a newsgroup. I know others who have linked to posts in similar situations. I just checked my web page and the links to those posts no longer work.

    Google just took a HUGE step backwards in my opinion.

  6. HW summary overview by grape+jelly · · Score: 4, Informative

    The article states:

    - Over four billion Web pages, each an average of 10KB, all fully indexed.
    - Up to 2,000 PCs in a cluster.
    - Over 30 clusters.
    - One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
    - Sustained transfer rates of 2Gbps in a cluster.
    - An expectation that two machines will fail every day in each of the larger clusters.
    - No complete system failure since February 2000.

    Now, 2,000 machines in a cluster, plus 1PB data, plus 2Gbps in a cluster times 30 clusters comes to:

    - "Over" 60,000 PCs (!)
    - "Over" 30PB data storage
    - "Over" 60Gbps bandwidth

    Also interesting:

    - An expectation that two machines will fail every day in each of the larger clusters.
    - No complete system failure since February 2000.

  7. Re:A little respect by dave-tx · · Score: 5, Insightful
    who are we to question the removal of features?

    We're the users. That's our right as users. If nobody questions the decision to remove features, then how does Google know what features we liked?

    There's absolutely nothing wrong with constructive criticism, even with respect to a "free" service.

    --

    >> "What would the robut do? Frame someone!"

  8. Re:WTF? by Eric+Giguere · · Score: 4, Interesting

    Well, I supposed it makes it easier to hide the stupid things some of us may have posted (especially in university) to Usenet back in the 80s and early 90s. Mind you, those "features" allowed me to resurrect some semi-useful postings I had made:

    Reading C Declarations: A Guide for the Mystified

    The ANSI Standard: A Summary for the C Programmer

    Eric
  9. Hey Google: you're being evil... by Anonymous Coward · · Score: 5, Interesting

    Try to search for a number using Beta and you'll see how broken it is.

    Also, it creeped me out to no end discovering this morning that my Gmail cookie is really a "Google Accounts" cookie which will now be attached to my Usenet forays via Google as well. I personally don't want the line between public and private conversations to be muddied like that, and I definitely don't want a unified cookie straddling both domains.

    Finally, the interface leaves a lot to be desired. The layout is cluttered and junky now whereas it was clean and simple before. I'm not enthralled by the Javascript hooks. Threading seems to be worse than ever (and still not done by message-ID or References - when I asked Google why this was via email, the response was "too difficult"... *boggle*) and the CLI-esque search ability is degenerating into a GUI mess; where one line of text and a CR would before get you to the page you wanted, it now can take that plus several additional mouse gestures and clicks.

    This is a sad day, to see a useful tool become so f**ked up for no apparent good reason. I can only hope and pray for a reversion.

  10. Respect is earned by Anonymous+Brave+Guy · · Score: 4, Insightful
    For all the years of good service we've had from google, who are we to question the removal of features?

    Excuse me, but their Google Groups feature is based entirely on profiting from others' work (and copyrighted work at that). If you're providing a properly searchable index, you might (might) have a public interest defence to the copyright infringement. If you're providing a useful service, most people might (might) not mind you using their work. But if you're going to take away useful searching facilities and provide a service that doesn't even allow proper citation (i.e., deep-linking to a specific post), you're going to be both unpopular and almost certainly breaking the law. I don't know about you, but personally I don't have much respect for people who are either of those things.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  11. two of the most useful features by RealProgrammer · · Score: 4, Interesting

    Absolutely. With so much spam and repetitive information on Usenet, I've always limited my searches by date.

    And linking to a single post is the whole point. I know it costs money to keep that stuff online, but surely they could find a way to put ads on deeplinked posts.

    Google just used up all its goodwill with me.

    --
    sigs, as if you care.
  12. Total catastrophe, a complete and utter misstep by BurkeTheEldar · · Score: 4, Insightful

    This is a disaster. I have hundreds of links to usenet articles via the old google groups. Those are all dead now. There is no browsable hierarchy of "groups"; no real message threading; far less info on a screen; what a mess. Google groups became my primary interface to usenet and my favorite aspect of google. It seems that google has completely lost its sense. This is one hell of a killer mistake by google.

  13. Re:A little respect by suso · · Score: 4, Insightful

    A little respect? Hah, unless they put these two features back within a week, they will cease to have any respect from me. I think I can safely cross Google off my "cool geeky things" list.

    I'm not sure what motivated such changes, but usually you don't remove enhancesments to software unless they are causing major problems or if they somehow affect your financial bottom line. Somehow I think its related to the latter of the two because I don't see how the former would case problems.

    You don't do something like collect nearly all the usenet postings ever made, make it searchable by date and then take it away. Basically people have lost the ability to do historical internet research using google groups. Sort by date is not even close to the same.

  14. ARRRRRRRRGH by MarcoAtWork · · Score: 4, Insightful

    search by date is the most useful feature when searching about many topics, often limiting the search to the last 2 years (or excluding the last 4 for example) yelds the results that one is looking for much more easily.

    I have bookmarks to specific articles/threads it took me a long time to find and to which I refer now and then and if they stop working the usefulness of google groups for me will be much reduced...

    As much as I understand why they would want to make USENET look more like a message board for people who never really grew up with it (usenet and gopher were mostly all we had back when I first went online) I still think that not having this functionality available for people who know how to make the most of it is very backward thinking.

    --
    -- the cake is a lie
  15. Re:OMG.. it's truly awful. by Kingpin · · Score: 4, Informative


    I'm believe that the "new groups" are not new usenet groups, but merely a yahoo-groups clone on the side, which gets he same interface as the one they provide for usenet groups.

    The old groups interface rocked. This is a major step in the wrong direction in my book.

    --
    Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
    Geocrawler error message.
  16. Direct Linking is still possible... by aridg · · Score: 5, Informative

    You can still do a deep link to a single article, if you like....

    Navigate to the thread, for example this comp.arch thread. Choose the post you want to link to, and click on "Show Options". Two of the options are "print", which is a link to a "printable" version of the article, and "Show original", which is a link to the article with all the headers.

    One more step (or simple URL hack) from this display is "view parsed" which gives a friendly HTML version -- for example, try this link.

  17. Re:RTFM by pbrammer · · Score: 5, Informative

    You are wrong. You are not on the new Google Groups page. There is sort by date, but not search by date. You want to look at groups-beta.google.com, not groups.google.com.

  18. Re:Evil? Re:Progress? by sk8king · · Score: 4, Insightful

    Unfortunately, I believe it to be inevitable that Google will become 'evil'. A single company that controls the search of all the information on the Internet.

    Search the web, newsgroups, your desktop etc. It may be all free and good now, but how long before someone pays the right price to access/control what people see.

    My experience is that Google search seems to be turning up more noise now than before. Two years ago I could with certainty do a search and get the page I wanted. Now it seems I must scroll through pages of commercial sites and the such to get to the meaty part of the Internet...those little novelty sites that people put up themselves.

    Oh well, that's progress.

  19. What Google Hardware Actually Looks Like by jon3k · · Score: 4, Informative

    I was actually lucky enough to visit a datacenter in the southeast united states (which will remain nameless, but if you do a little searching, Im sure you could figure it out) where Google colocates. I want to say they had something like 18,000 square feet just for them, behind a partitioned wall. We were *not* allowed back there, despite my pleading.

    Anyway, as we were walking around the 150,000+ square foot datacenter floor, when a guy came by, pushing a very odd looking rack.

    It resembled a bread tray, 20 shelves if I counted correctly, with completely naked main boards sitting on them. It looked to be 4 machines per row (counting the power supplys). Each had one IDE disk sitting on a gel pad, strapped in with velcro. I personally watched them wheel 4 of these racks right by me back into the dark "Google" corner of the datacenter. Our tour guide finally gave in.

    Him: "Well, you've seen them now!"
    Me: "What do you mean?"
    Him: "Thats google!"

    Definitely the highlight of my day!

  20. Google File System by gtoomey · · Score: 4, Interesting

    Implementation details of the Google File System can be found in this paper by Google engineers.

  21. Deep linking is still very much possible! by Ivan+Todoroski · · Score: 5, Informative

    Who was the idiot that started this rumor?

    Each message in a thread has a named HTML anchor, try this for instance. It will show the whole thread, but position you at an exact message in the middle.

    The only problem is there is no easy way to get this URL, you have to find the anchor by looking at the HTML source (Firefox's "View Selection Source" feature helps a lot).

    Also, if you click on the "Options" link by the individual message, you get a "Show original" link, which shows just the message, verbatim.

    And from there, you can click on "View parsed", and see just the pretty message, without the rest of the thread.

    So there's your deep-linking. I agree it's not obvious how to do it at the moment, but the ability is obviously still there. Give it some time, it's still a beta!

    These quirks and the "Server Error" bugs are to be expected, they'll work it out.

    As for the new browsing interface itself, I kinda like it. It integrates and borrows some stuff from their excellent Gmail interface.

    It hides quoted text by default (you can expand it with single click), so you don't have to scroll through some morons quoting of a whole message just to add a few words, it keeps a history of groups you recently visited, it allows you to bookmark topics you are interested in, etc. I do find it an improvement over the old interface.

    The only thing is the missing date search, I agree there, that was definitely useful feature. If enough people complain, maybe they'll bring it back.

    Also, someone else complained that you cannot browse by group anymore... bullshit, it's staring you right in the face, it's the "Browse all of Usenet" link.

  22. Re:Evil? Re:Progress? by robertjw · · Score: 4, Insightful

    Not to defend any evil company (won't publicly do that until I own one and it has made me a kazillionaire) but I'm not ready to count Google as a evil corporate entity yet. They are still in a relatively young market and competitive market. They can't afford to piss everyone off at this point - so I'm guessing that they THINK they are making improvements.

    I remember when they originally took over the archive from deja. I was devestated - convinced they were going to totally screw it up. They didn't, or I got used to the screwed up version.

    Also, regarding noise appearing in searches, this is a standard cycle that all search engines go through and Google's experiences are well documented. They are constantly changing their search engine to give the most relevant results. Gradually commercial sites that depend on high search results spend enough time and money optimizing their site. Google is constantly changing their tech to push that noise down, but it always gradually floats back to the top. It's in Google's best interest to show commercial sites in their paid ads, not in the valid search results.

  23. Google changed within the past three hours by michaelmalak · · Score: 4, Interesting

    I submitted the Slashdot story at 8:30am EST. At that time, groups.google.com went to the Beta. Now at 11:15am EST, groups.google.com is the old version, and the Beta has been relegated to a "Preview" link. Sometime in between, Google changed.

  24. Goodbye Google? by ngunton · · Score: 4, Interesting

    This may be a little off-topic, but it's been on my mind recently so I thought I'd mention that I recently blocked Googlebot from my website. Why? Because they were using a new version of the bot that was requesting pages WAY too rapidly, as in tens of pages every second. This new version pretends to be a "real" browser (using the "Mozilla (compatible)" format). The old version (User-Agent begins with "Googlebot") was also present, and requesting pages politely. I think this new version was part of their recent effort to regenerate their index and "deep scan" websites, because it was shortly after this that they advertised their index doubling in size.

    There were other issues as well as the rapacious spidering (which reminded me of some of the worst spambots out there), but I won't go into the details here. I didn't get any satisfactory resolution from Google when I tried contacting them.

    Website suicide? I don't know. All I do know is that Google seems to be fulfilling my biggest fears - they are going downhill as they get bigger. Funny how the bigger a company gets, the more it tends to suck. Also, having an IPO is never a good thing, in my experience - it always leads to short-termism and corporate decisions based more on the bottom line than what's actually good for the users. Sure, any company has to look after its shareholders and investors, but they never seem to really grok that being so focused on the short-term negatively impacts things in the longer term, particularly if it loses you goodwill in the userspace. Also, as a company grows you do tend to get the sort of braindead, clueless decisions coming out that we apparently see here.

    So now we have Google restricting what we can do with old Usenet posts... didn't they buy up all the archives for this stuff a while back? This would appear to give them some amount of power, but also (they should realize) responsibility as stewards of the past. This is not something that they are simply indexing on someone else's website, it's data that they actually own. But in this case it's not really their data at all - it's the community's.

    Google seems to be slowly using up the goodwill they built up since 1998 when they came onto the scene, a small, fast, simple, charming and relevant search engine that kicked ass. Why can't a company just keep doing what it does well, and be satisfied with that? Why does everything have to eventually grow, expand, gobble up other companies, and then inevitably start to suck?

    Never mind... for now, Goodbye Google.

  25. First use of "spam" on USENET, found via Google by notthepainter · · Score: 4, Interesting
    A friend forwarded this to me several years back.

    http://groups.google.com/groups?q=ken+weaverling+s pam+usenet+first&hl=en&selm=9v6d5h%245pg%241%40new s.dtcc.edu&rnum=1

    According to Ken and his search of google, I was the first people to ever use the word "spam" to refer to unwanted electronic communication. Obviously, I did'nt know it at the time and was quite surprised to learn of my "fame." Yeah, that and $7 will get me a cup of mocha-something, I know.

    Anyhow, the whole point is that Ken's reserach was aided by the search by date feature. It will be a shame if that is removed.

    (And for the curious, I changed my name from Czarnecki when I got married.)

  26. Re:Evil? Re:Progress? by ArsonSmith · · Score: 4, Interesting

    google needs a "I'm not shopping flag" you can put into the search string like !shopping or something. Maybe I will suggest that on that link up a few posts.

    --
    Paying taxes to buy civilization is like paying a hooker to buy love.
  27. Don't like how the Google Usenet archive evolves? by martin-k · · Score: 5, Interesting
    If you don't like how Google's Usenet search engine and archive evolves (neither do I; Dejanews was tops for its time and things went downhill from there), help the competition... :-)

    I already have an archive of around 600 million messages (nearly everything sans binaries from 2000 till today; just a couple of terabytes) and intend to create a public Usenet search engine. As I am using Usenet myself on a daily basis, I know what *I* want in a Usenet search engine, and that's quite different from what Google gives us.

    Here's how you can help: Contact me at martin-k (at) softmaker.de if you have a private collection of Usenet postings that you want me to put in the database.

    -mk