Google Experiments
gafferted writes "The boffins at google have been experimenting with new toys, such as Keyboard Shortcuts and glossary, but most fun is Google Sets. Try "green, purple, red" to get a set of 40 different colours. Try a set that contains both Richard Stallman and Bill Gates, see what google associates with Slashdot or ask for a set of rude words."
If you're a huge Google fan (and aren't we all) check out Google Weblog. They had this story 2 days ago, plus they keep you up to date on other cool Google happenings.
And no, it's not my site. I just think it's cool.
daed si luap
I work at a place that is kind of touchy about content served up to those who signed the agreement to be allowed online, and that link to rude words I think needs one of the fark "Not Safe For Work" things after it.
Yeah, the thing doesn't link to boobies, but grepping for incoming text vs. grepping for inbound boobies is a tad easier for log generation.
Besides, I thought rude words just involved being insensitive, not foul.
Wheeeee
If you're talking about the Dilbert thing, it is by design.
;) (You know, the &hl=sv&ie=UTF8 part)
If you're talking about the language thing, it is your mother tongue.
Maybe I should have clarified a little more. Amazon's suggestions never work for me, because I like old school, early nineties grunge and they figure that music sort of falls into "heavy" and "not heavy" categories. So I'm constantly reccomended stuff by Metallica, System of a Down, Primus, etc which are all heavy bands, but which don't fit the particular vein of heavy music I like. Google also made the jump to returning Godsmack, which is a newer hardcore band with a lot of grunge fundamentals (which I like) and Tool, and artsier, more emo-influenced heavy band, but still with grunge influences and ties. I don't think Godsmack would ever refer to themselves as grunge, but the connection exists, and google figured it out. Extremely impressive - after all, this is a highly subjective subject area, but one that still has some overall generalizations to it.
$45 per U Colocation Special
Each query phrase produces a set of documents, i.e. web pages. The intersection of those sets gives a small set of docs which is pretty much the same thing that a normal google query (or any search engine) will return, if all the queries are ANDed. Then the new feature is to find the intersection of all the terms from all the docs in the doc-intersection set. That is, return all the terms that are common to all the docs.
e.g. in pseudo-code: Assume ...}. ... ...; // so docSets contains the URLs of the docs that have all the query terms // ws will contain the running intersection of the set of words in all the docs
- G is the normal google search engine.
- G.query("search phrase") returns a set of references (URLs) to docs, e.g. {u1, u2, u3,
- u.terms() returns a set of all the words contained in the doc referenced by u, e.g. if u=="http://slashdot.org", then u.terms() == {"news", "for", "nerds", "slashdot", etc.}.
- * is a set intersection operator.
s1 = G.query(q1); s2=G.query(q2); s3=G.query(q3);
docSets = s1 * s2 * s3 *
ws = docSets[0].terms();
forall url in docSets { ws = ws * url.terms(); }
return ws;
So my guess is that ws is the final set of terms returned by the google set. Of course, the words should be sorted by some meaningful metric, e.g. frequency. This is all very easy to implement and can be done very quickly, because finding the document set intersection and the word set intersections can be done very quickly using sparse vectors to represent word or document vectors.