Slashdot Mirror


Google Experiments

gafferted writes "The boffins at google have been experimenting with new toys, such as Keyboard Shortcuts and glossary, but most fun is Google Sets. Try "green, purple, red" to get a set of 40 different colours. Try a set that contains both Richard Stallman and Bill Gates, see what google associates with Slashdot or ask for a set of rude words."

5 of 186 comments (clear)

  1. Google Blog by nob · · Score: 5, Interesting

    If you're a huge Google fan (and aren't we all) check out Google Weblog. They had this story 2 days ago, plus they keep you up to date on other cool Google happenings.

    And no, it's not my site. I just think it's cool.

    --
    daed si luap
  2. Fark-like Not Safe For Work by heliocentric · · Score: 2, Interesting

    I work at a place that is kind of touchy about content served up to those who signed the agreement to be allowed online, and that link to rude words I think needs one of the fark "Not Safe For Work" things after it.

    Yeah, the thing doesn't link to boobies, but grepping for incoming text vs. grepping for inbound boobies is a tad easier for log generation.

    Besides, I thought rude words just involved being insensitive, not foul.

    --
    Wheeeee
  3. Re:Google *do* cach itself, and the result is funn by gakguk · · Score: 2, Interesting

    If you're talking about the Dilbert thing, it is by design.


    If you're talking about the language thing, it is your mother tongue. ;) (You know, the &hl=sv&ie=UTF8 part)

  4. Re:Very Impressive by nemesisj · · Score: 3, Interesting

    Maybe I should have clarified a little more. Amazon's suggestions never work for me, because I like old school, early nineties grunge and they figure that music sort of falls into "heavy" and "not heavy" categories. So I'm constantly reccomended stuff by Metallica, System of a Down, Primus, etc which are all heavy bands, but which don't fit the particular vein of heavy music I like. Google also made the jump to returning Godsmack, which is a newer hardcore band with a lot of grunge fundamentals (which I like) and Tool, and artsier, more emo-influenced heavy band, but still with grunge influences and ties. I don't think Godsmack would ever refer to themselves as grunge, but the connection exists, and google figured it out. Extremely impressive - after all, this is a highly subjective subject area, but one that still has some overall generalizations to it.

  5. Quick theory on how Google Sets works by kindofblue · · Score: 2, Interesting
    I'm guessing that Google sets could work something like this.

    Each query phrase produces a set of documents, i.e. web pages. The intersection of those sets gives a small set of docs which is pretty much the same thing that a normal google query (or any search engine) will return, if all the queries are ANDed. Then the new feature is to find the intersection of all the terms from all the docs in the doc-intersection set. That is, return all the terms that are common to all the docs.

    e.g. in pseudo-code: Assume
    - G is the normal google search engine.
    - G.query("search phrase") returns a set of references (URLs) to docs, e.g. {u1, u2, u3, ...}.
    - u.terms() returns a set of all the words contained in the doc referenced by u, e.g. if u=="http://slashdot.org", then u.terms() == {"news", "for", "nerds", "slashdot", etc.}.
    - * is a set intersection operator.
    s1 = G.query(q1); s2=G.query(q2); s3=G.query(q3); ...
    docSets = s1 * s2 * s3 * ...; // so docSets contains the URLs of the docs that have all the query terms
    ws = docSets[0].terms(); // ws will contain the running intersection of the set of words in all the docs
    forall url in docSets { ws = ws * url.terms(); }
    return ws;

    So my guess is that ws is the final set of terms returned by the google set. Of course, the words should be sorted by some meaningful metric, e.g. frequency. This is all very easy to implement and can be done very quickly, because finding the document set intersection and the word set intersections can be done very quickly using sparse vectors to represent word or document vectors.