Slashdot Mirror


Google Programming Contest

AccordionGuy writes: "Google has just announced its first annual programming contest! The objective is to write a program that will do something "interesting" with the about 900,000 Web pages' worth data that's Google provides. In addition to writing the program, contestants also have to convince the judges why their program is interesting (or useful) and why it will scale (that is, handle a constantly increasing load of data that grows as the Web grows). The prize is US$10,000 in cash, a V.I.P. tour of the Google facility in Mountain View, California and possibly a chance to run their program on Google's complete billion-Web-page store."

210 of 629 comments (clear)

  1. A program that deletes pages. by suso · · Score: 4, Funny

    I think I'll write a program that will delete pages as it finds them. This should scale pretty nicely and make the web faster in the process.

  2. I know what someone should make! by Cruciform · · Score: 4, Troll

    How about adding the option to have google understand what I *mean* to search for, not what I tell it to search for.

    Oh, and the ability to find one non-fake Britney porn pic.

    1. Re:I know what someone should make! by foobar104 · · Score: 3, Interesting

      How about adding the option to have google understand what I *mean* to search for, not what I tell it to search for.

      You might have been kidding, but you've got a really good idea there.

      How about semantic searching: equip Google with a database that organizes words in a relational hierarchy from the general to the specific. For example, "orange" is a more specific form of "fruit," and also a more specific form of "color."

      When you search for "orange," Google might also have the ability to search for "fruit" and "color," depending on how broad you want your search to be.

      Just a thought.

    2. Re:I know what someone should make! by Anonymous+DWord · · Score: 2

      Hey, I may have slept through umpteen calculus and matrix algebra classes, but since when does 0 equal 1?

      --
      "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
    3. Re:I know what someone should make! by Sancho · · Score: 3, Insightful

      I'd just like the ability to use regular expressions in my search...and maybe also have a localization function where I could require that certain search terms be within so many words of each other.

      Erik

    4. Re:I know what someone should make! by shogun · · Score: 2

      Did you mean orange the fruit, orange the color, orange the tree, or orange the river"
      Great idea but how is the search engine going to tell which meaning of the word its looking for in context? Now that would be a very useful step if you can find a nice way to do it.

    5. Re:I know what someone should make! by big_hairy_mama · · Score: 2

      As great as that would be, the first person to write a regular expression engine that can process a petabyte of indexed web-pages in 0.25 seconds should get more than $10,000!

      The problem is that search engines use pre-indexed tables of words, probably one table for every word used by any page anywhere. Regular Expressions have to process the raw data, which wouldn't scale worth a dime.

    6. Re:I know what someone should make! by foobar104 · · Score: 2

      So now when I search for "Ornage" it asks me "Did you mean orange." I guess Google could extend this if it was hooked up to WordNet - "Did you mean orange the fruit, orange the color, orange the tree, or orange the river"

      EXACTLY! Like that scene in 2010:

      CHANDRA
      I would like to open a new file. Here is the name for it. [types "phoenix"] Do you know what that means?

      SAL
      There are twenty five references in the current encyclopedia.

      CHANDRA
      Which one do you think is relevant?

      SAL
      The tutor of Achilles?

      CHANDRA
      That's very interesting, I didn't know that one. Try again.

      SAL
      A fabulous bird, re-born from the ashes of its earlier life.

      CHANDRA
      That is correct.

    7. Re:I know what someone should make! by Nightpaw · · Score: 2

      Well, once it pulls out the pages with all the search terms, than it can send them over some dedicated processors to check nearness. I'm sure people would be willing to wait 5 minutes for it, especially if it gave you a window the straight-up regular-style results in the meantime.

    8. Re:I know what someone should make! by foobar104 · · Score: 2

      You're being too simplistic, probably because of my hurried and incomplete example. Semantic searching is most useful in the "general to more specific" instances. While searching for "orange" shouldn't necessarily search for "fruit," as you point out, it's very likely that a search for "fruit" should key off of "orange."

      This is really much more applicable to concept searching than it is to simple text indexing. (Mmm... SimpleText...)

      For example, if a catalogger is describing a painting, she might use the word "orange" to describe the subject of a still life. In that instance, the painting's metadata structures would reflect the fact that "orange" in the subject field must be the concrete noun "orange," which is a specific instance of the concrete noun "fruit." The idea being that a user could search for "fruit" and get a hit on a painting that has been described as "still life with oranges."

      So maybe my idea isn't all that applicable to Google after all. Hell, I'm not even sure I'm on topic any more. ;-)

    9. Re:I know what someone should make! by Alan · · Score: 2

      Yup, re-implementing the 'electricmonk', a search engine that I used almost as much as google when it was still alive.

      For those who don't remember it or have never heard of it, it was a natural language parser search engine that would handle searches like "how do I make tomato soup" or "what were the greatest inventions of alexander graham bell". Much easier to type in what you *mean* instead of things like "'alexander graham bell' + greatest !+inventions" or such like.

      The problem with electricmonk.com was that it didn't have the huge resource to search from that google did. A combination of the two could be incrediably kick ass, especially if it was just an option to type in on the main search bar! ("why does my kernel break with a foo.o error?")

    10. Re:I know what someone should make! by h2odragon · · Score: 2
      Example: "imminent domain" .

      I'm working on it... Google surely isn't buying it from me for a chance at $10k though.

      Britney pr0n i cant help with. text only. sorry.

    11. Re:I know what someone should make! by spudnic · · Score: 2

      Maybe so, but would Google be willing to give you the 5 minutes of processing time this exhaustive search takes? Get 20 or 30 thousand people doing it at once and that would require some heavy duty hardware upgrades.

      --
      load "linux",8,1
  3. The average color of the WWW by I+am+the+blob · · Score: 5, Interesting

    Much like the recent discovery of the average color of the universe, this would be a pointless, but fun, use of the data. Of course, I'm not sure exactly what to average. Do you take into account browser real-estate a particular color occupies? Do you simply average each color= and stylesheet instance?

    Ideas?

    --

    All sweeping generalizations suck.
    1. Re:The average color of the WWW by negativekarmanow+tm · · Score: 4, Funny

      I think it's more in the skincolor/pink region.

      --
      No security through obscurity: my password is goatse. Stop me before I troll again.
    2. Re:The average color of the WWW by AtrN · · Score: 2

      Well, for starters. 95% of it is brown and sticky.

    3. Re:The average color of the WWW by Perdo · · Score: 2

      World Opinion "color": Parse the web for specific text strings to determine public opinion. I couuld do a search for "Dogs are great" -vs- "Cats are great" And compare results bar chart style. Some of the neatest information I ever see on google is the Zeitgeist. Now, imagine being able to pick the topics. Used properly it could help predict trends in business, politics and (world) public opinion.

      --

      If voting were effective, it would be illegal by now.

  4. Well this is strange by The+Bungi · · Score: 3, Insightful

    10K is nice along with the recognition and all, but... I'm sure that's a lot cheaper than paying a few Google staff coders to come up with the same thing in a few months.

    Jus' being paranoid.

    1. Re:Well this is strange by plalonde2 · · Score: 4, Insightful
      More to the point though is that it gives Google a great pool of potential employees. That should be of greater benefit to Google than the ideas.

      Always think of the potential of hiring people with good ideas, rather then buying the ideas outright.

      Geese and golden eggs, and all that.

  5. This is brilliant by jkujawa · · Score: 2, Insightful

    Evil, but brilliant.

    Get hundreds of people to crank out code for you, pay a paltry sum to one of them, keep all the code. Pay $10K for millions of dollars in potential technology.

    That's about the slickest thing I've ever seen. You have to admire them for their evil. Microsoft could learn a thing or ten from them.

    1. Re:This is brilliant by dotderf · · Score: 3, Insightful

      It's not evil, it's just business. Other companies have been doing it for years. Back in the day, car companies used to sponsor "car design" contest for little kids. The winner would get $50 and his car would be whisked away to the labs. Why pay a team of designers and engineers to do what a trained^H^H^H^H^H^H^H normal person would do for cheap? Maybe we'll get a spiffy new feature on google! Hurrah!

    2. Re:This is brilliant by JordanH · · Score: 3, Insightful
      • You have to admire them for their evil. Microsoft could learn a thing or ten from them.

      What's evil about it? Smart maybe, but evil?

      Anybody who would enter such a contest is primarily motivated by the challenge, I would think. Getting the $10K gives you bragging rights is all.

      Sure, Google gets some value, but a lot of highly motivated programmers get a challenging problem.

      If all good programmers were primarily motivated by money, there'd be no Linux, BSD, Apache, Emacs, Vim...

      I reserve evil for things that actually hurt someone. This seems like a win-win to me.

    3. Re:This is brilliant by saint10 · · Score: 5, Funny

      Better yet, post a story to slashdot about a contest with a prize of 10k, read all the responses moded at 4 and above, spend a weekending coding a few of em up, and cash in!

      Now that's evil!

    4. Re:This is brilliant by slam+smith · · Score: 4, Insightful

      The key word here is potential. I think that you would almost waste more money in evaluating a lot of the trash that comes in. The most valuable thing they probably will get from it are the ideas that people come up with. Notice how they made it as open ended as they could.

    5. Re:This is brilliant by epsalon · · Score: 5, Informative

      If you read the rules, you will see that you don't even have to assign copyrights to Google. You only have to give them a license. This means you can GPL your code or even BSD it. Sounds fair to me.

    6. Re:This is brilliant by TheAwfulTruth · · Score: 3, Insightful

      You just described open source exactly. Except the part about paying ANYTHING at all. Pretty slick!

      --
      Contrary to popular belief, coding is not all free blow-jobs and beer. Those things cost MONEY!
    7. Re:This is brilliant by ameoba · · Score: 2

      Well at least it's not like the guys at Software Carpentry who got a governement contract and held a 'coding contest' to see who could write their toolset for them.

      --
      my sig's at the bottom of the page.
    8. Re:This is brilliant by digitalunity · · Score: 2

      Yes you could GPL it yourself. However, the BSD license would be excluded. On the page itself, it says that the only submissions that will be accepted are GPL'd.

      --
      You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
    9. Re:This is brilliant by Tom7 · · Score: 5, Interesting

      Unfortunately, all the comments at 4 and above are complaining about how Google intends to rip people's ideas off.

    10. Re:This is brilliant by SmittyTheBold · · Score: 2

      Well, then...help Google rip people off! Genius!

      make a Java applet that pops up a dialog..."Hello from the accounting department of [ISP Name]. We need to re-verify your account information. ..."

      It could even use the Google index to take the reverse-lookup of the person's IP, then locate the ISP's real name.

      --
      ± 29 dB
  6. Usefulness? by Hi-Tech+Redneck · · Score: 2, Interesting

    I'm honestly curious as to what kind of useful programs could be run on that collection of pages and still be interesting? Statistical Analysis? Boring! Or maybe market analysis? Again, BORING! Some of the more trivial interesting things, like how much of phrase or word x appears on the internet couldn't really be termed useful... Hopefully, somebody will prove me wrong. Good luck to all you developers...

  7. What a coincedence! by ctkrohn · · Score: 2, Interesting

    I was just talking to someone on IRC, and we were playing a game with Google. You had to find a two correctly spelled words which would obtain a page or less of results. He mentioned that a distributed client which searches for the longest string of words returning less than a page would be a cool idea.

    Just a thought...

    1. Re:What a coincedence! by Alsee · · Score: 2

      Pretty easy. Got it on my second try.

      It kinda helps to happen to know that the very last word in the scrabble player's dictionary is zyzzyva (requiring the only Z, both Y's, both BLANK's, and half of the V's. 75 points with the seven tile bonus LOL). I even remember that zyzzyva is a tropical weevil, hehe.

      My immediate reaction for a second word was aardvark, but bad choice. Several occurrences of "from aardvark to zyzzyva".

      For my second try I went with meteorology which returned 2 matches (plus 1 redundant match not displayed). The first match was this evil page. (WARNING - 9.1 MEG TEXT DATAFILE)

      Then I found zyzzyva herpetology which returns no matches. Herpetology: study of reptiles and amphibians.

      P.S.
      People just can't resist challenging you when you put a word vertically simultaneously creating 4 or 5 two letter words horizontally, and announce that "oe" is a whirlwind off the Faeroe islands :)

      Y,IAAG. (Yes, I am a geek) I created a list of all (96) legal 2 letter words. I haven't memorized them all though. Perhaps because no one seems to want to play scrabble with me :)

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  8. Hmmm... by Kjella · · Score: 3

    10000$/x hours of work we could get done for us...

    Make sure we get a slashdot posting so a bunch of geeks with programming skills will enter.

    The only thing I'd want is for google to stay just the way it is though, don't bloat. Great service, maybe I'm just pessimistic but sites rarely do everything well.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  9. Free Programming(or nearly free)... by yonnage · · Score: 2, Insightful

    Sounds to me that google is getting lots of programs for only $10k and a tour.

  10. Some Inspiration by Eloquence · · Score: 5, Insightful
    A lot of implicit rating data can be gathered from the links pointing to a page. Google is already doing this when sorting the search results (frequently linked-to pages rank higher). It would be interesting to see how this could be used to detect very popular new sites. I sent this mail to Google a while ago:

    Hi,

    it occurred to me, since you are evaluating the number of links pointing to a page anyway, that it would be a very nice thing to have a sort of "Top 40 Links of the Day" page, regularly updated to include only new and unique stuff. You could use an algorithm similar to the one used by

    http://blogdex.media.mit.edu/

    or

    http://www.daypop.com/

    Both of these sites have become immensely popular through this feature (in the case of Daypop, I find http://www.daypop.com/top.htm very valuable), and I think it would also be a great addition to Google. I don't think inappropriate content would be much of a problem since it would hardly show up high on the list, and besides, a top 40 list can be looked through by a human.

    What do you think?

    Of course this could be spammed, but as I said, a human could filter the results every day; besides, it would be hard to create a very large number of unique links from different servers pointing to a page. I'm sure Google is already doing some of this to prevent spamming their search-order algorithm anyway.

    1. Re:Some Inspiration by jimbo3123 · · Score: 2, Informative

      it occurred to me, since you are evaluating the number of links pointing to a page anyway, that it would be a very nice thing to
      have a sort of "Top 40 Links of the Day" page, regularly updated to include only new and unique stuff. You could use an
      algorithm similar to the one used by


      It's Called Google Zeitgeist.

      It is at:
      Zeitgeist[Google.com]

      --
      There should be a moderation category "Dumbest Comment EVER"
    2. Re:Some Inspiration by costas · · Score: 3, Interesting

      I hate to link a beta-level site from /., but that's exactly what I am trying out...

  11. Cool, but..... by IamTheRealMike · · Score: 2, Insightful

    This sounds really great doens't it? 10,000 USD cash prize, visiting their facilities (who wouldn't be curious to see the worlds biggest Beowulf cluster) and more.

    Thing is, though that is a lot of money, what happens if you make them, say 20,000 USD with a great new compression/analysis algorithm.

    What then? You have no claim to a part of their profits. I guess that's just a part of competing to give your ideas to a company.

    -mike

    1. Re:Cool, but..... by Chmarr · · Score: 2

      Especially considering that Google gets to 'own' all the entries, and not just the winning one.

      Hey... it worked for Microsoft (Their 'Compression' contest)

    2. Re:Cool, but..... by anthony_dipierro · · Score: 3, Insightful

      Thing is, though that is a lot of money, what happens if you make them, say 20,000 USD with a great new compression/analysis algorithm.

      If you're that good, they'll probably hire you to at least consult for them to maintain the code you wrote.

  12. Googlewhacking by waldoj · · Score: 4, Informative

    An automated Googlewhacking system.

    Ingenius!

    -Waldo Jaquith

    1. Re:Googlewhacking by angst_ridden_hipster · · Score: 2

      Here's a valid one:

      limaceous cretin

      (until this page gets indexed... get it while it's fresh!)

      --
      Eloi, Eloi, lema sabachtani?
      www.fogbound.net
    2. Re:Googlewhacking by Compuser · · Score: 2

      gipsy + colonoscopy

      That took 1 minute of trying. Guess not that
      hard of a pursuit.

    3. Re:Googlewhacking by Alsee · · Score: 2

      Folliculitises Jeezus christ! A word that only appears on the internet once, and even then only in a list of all words.

      I propose that any word that does not exist on the internet (lists of words excluded) be declared no longer a real word.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  13. So basically... by Dutchmaan · · Score: 3, Insightful

    They're going to (hopefully) get tons of interesting ideas and almost as much useful code for the price of $10,000. Sure beats hiring programmers.

    That's assuming that any contest entries automatically become the property of Google.

    Perhaps this is the evolution of a new buisness model... Either way, I don't really care as long as Google remains free, fast, and useful!

    1. Re:So basically... by anthony_dipierro · · Score: 5, Informative

      That's assuming that any contest entries automatically become the property of Google.

      With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry

      So basically, google doesn't own your code, only the right to use it. GPLing your code would satisfy the worldwide, perptual non-exclusive license grant.

    2. Re:So basically... by geekoid · · Score: 2

      they do, but many companies have done this in the past.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    3. Re:So basically... by PurpleBob · · Score: 2

      I'm sure the world will be eternally grateful when you release your GPLed code which only works when it's hooked up to Google's database.

      --
      Win dain a lotica, en vai tu ri silota
  14. Notice their contest agreement? (was Re:Well th..) by Christianfreak · · Score: 2
    From the agreement:

    With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry.


    Hey Google! Why not make the agreement state that all entries go under the GPL?
  15. The biggest Dictionary by p-n-wise · · Score: 4, Interesting

    I'd go for a dictionary of every word ever used on the web. Complete with common usage examples.

    --
    I am the NUL and the DEL, the beginning and the end.
    1. Re:The biggest Dictionary by dillon_rinker · · Score: 2

      There is no such thing as correct spelling. There is only consensus. Those big brown things with green bits all over them...do you think that 2,000 years ago the correct spelling was "TREE"?

      This is actually one of the most interesting ideas I've seen...develop a database that dictionary writers can use.

    2. Re:The biggest Dictionary by pmc · · Score: 2

      Of course not - the correct spelling is "HILL"

    3. Re:The biggest Dictionary by dillon_rinker · · Score: 2

      A good dictionary also serves a normative function

      Have you ever studied how to write a dictionary?

      You can't just accept any old spelling and usage of words on the net, or you wind up in Humpty Dumpty world where anyone is allowed to use and spell words however they choose
      No, but if you find that 99% of the population is spelling "enough" as "enuff", then maybe your dictionary is out of date.

      After all, if anyone is allowed to define a new meaning and/or spelling for a word at will there is no point in collecting those spellings and meanings anymore as they are subject to arbitrary change
      NEWS FLASH! What you seem so afraid of is how language works.
      - I am the baddest dude on the block
      - That new car is the bomb
      - Shutup! (synonyms: "Get out!" "You're kidding me!" "That is too good to be believed")

      Granted, these are currently slang (ie used when speaking informally, and usually not written except as dialog) but that is how words start...

      BTW, if people can't arbitrarily change the meaning of a word, then tell me how "computer" came to have its current definition as an electronic device. 75 years ago a computer was a person who performed computations.

  16. I know! by AntiFreeze · · Score: 2, Interesting
    Someone could do a CRC (cyclic redundancy check) on all the pages in the cache, that way, one could tell when the Internet's been updated...

    Even Stupider: Not only easy, but it could allow google to create static result pages for common searches: it would just update the result page when the cache CRC changes.

    --

    ---
    "Of course, that's just my opinion. I could be wrong." --Dennis Miller

  17. map of the internet, using the internet... by edrugtrader · · Score: 3, Interesting

    how about have google parse every page, and save the homepage as an image. then take the map of the internet, and make it using tiny thumbnails of the most heavily linked (popular) sites.

    this would be just like those mosaic photos, only much nerdier. thinkgeek execs are drooling already....

    --
    MARIJUANA, SHROOMS, X: ONLINE?! - E
  18. How about a FPS game? by t0qer · · Score: 3, Interesting

    A few years back there was a game, I think it was called Virus or something like that. It would scan your directory structure and make a map for the FPS world based on that.

    Looking at the web, I allways though it would be cool to make a game based on the same concept, but use web pages instead of your hard drive directory.

    I'm just throwing out ideas.

    1. Re:How about a FPS game? by BlueGecko · · Score: 2
      A few years back there was a game, I think it was called Virus or something like that. It would scan your directory structure and make a map for the FPS world based on that.
      So if you were standing in the C: room and unleashed a flury of rockets, was that equivalent to rm -rf /?
    2. Re:How about a FPS game? by mlk · · Score: 2
      --
      Wow, I should not post when knackered.
    3. Re:How about a FPS game? by tswinzig · · Score: 2

      A few years back there was a game, I think it was called Virus or something like that. It would scan your directory structure and make a map for the FPS world based on that.

      Looking at the web, I allways though it would be cool to make a game based on the same concept, but use web pages instead of your hard drive directory.


      I always thought it would be cool if Quake became a user interface for an operating system. Just imagine, instead of encrypting your files, you would just put them in a room guarded by bad-ass monsters and surrounded by a moat of molten lava.

      You could organize your files by rooms in your house. No wait -- you could have a filing cabinet in one of these rooms, filled with folders. In each folder you could have documents that you've 'created'. If you want to get rid of a document, just drop it in the 'waste basket.'

      I think I'm on to something here.

      --

      "And like that ... he's gone."
  19. Google Press Release by wizarddc · · Score: 5, Funny

    Google Contest Winner Offers Better Porn Searches

    Winner of the First annual Google Programming Contest creates greatest porn spider ever.

    MOUNTAIN VIEW, Calif. - December 11, 2001 - Google Inc., developer of the award-winning Google search engine, today announced it's first winner of the Annual Google Programming Contest. Winner I. C. Porno has created a program to help catalog and organize google cache of the Internet, also refered to as the World Wide Web of Porn.

    "This announcement is an important step in Google's ongoing effort to provide search services that are fast, easy to use, and that help people find the information they need," said Larry Page, Google's co-founder and president of Products. "To search our collection of 3 billion documents for porn by hand, it would take 5,707 years, searching twenty-four hours per day, at one minute per document. With I. C.'s new program, it takes less than a second."

    World's Largest Collection of Porn
    Google users now have the world's largest and most comprehensive collection of porn right at their fingertips and can immediately primal urges using the following services:

    Google Web Porn Search: The company's newest search service now offers more than 2 billion documents - 25 percent of which are non-English language web pages. Google Web Search also offers users the ability to search for numerous non-HTML files such as PDF, Microsoft Office, and Corel documents. Google's powerful and scalable technology searches this comprehensive set of information and delivers a list of relevant porno in less than half-a-second.

    Google Porn Groups: This 20-year archive of Usenet porn conversations is the largest of its kind and can serve as a powerful reference tool, while offering more porno than the Internet. Google Groups was released from beta today with 700 million postings in more than 35,000 topical porno categories.

    Google Image Search: Comprising more than 330 million nude images, Google Image Search enables users to quickly and easily find porn images relevant to a wide variety of topics, including pictures of celebrities and popular travel destinations. Advanced features include search by image size, format (JPEG and/or GIF), coloration, and the ability to restrict searches to specific genre's of porn.

    About Google Inc.
    With the largest index of websites available on the World Wide Web and the industry's most advanced search technology, Google Inc. delivers the fastest and easiest way to find relevant information on the Internet. Google's technological innovations have earned the company numerous industry awards and citations, including two Webby Awards; two WIRED magazine Readers Raves Awards; Best Internet Innovation and Technical Excellence Award from PC Magazine; Best Search Engine on the Internet from Yahoo! Internet Life; Top Ten Best Cybertech from TIME magazine; and Editor's Pick from CNET. A growing number of companies worldwide, including Yahoo! and its international properties, Sony Corporation and its global affiliates, AOL/Netscape, and Cisco Systems, rely on Google to power search on their websites. A privately held company based in Mountain View, Calif., Google's investors include Kleiner Perkins Caufield & Byers and Sequoia Capital. More information about Google can be found on the Google site at http://www.google.com.

    --
    Th
    1. Re:Google Press Release by tmarzolf · · Score: 2, Funny
      Google Web Porn Search: The company's newest search service now offers more than 2 billion documents - 25 percent of which are non-English language web pages. Google Web Search also offers users the ability to search for numerous non-HTML files such as PDF, Microsoft Office, and Corel documents.

      For all that Corel formatted porn out there...

      --

      This Sig has been depreciated.

    2. Re:Google Press Release by Cyn · · Score: 3, Funny

      Brings new meaning to googles "I'm Feeling Lucky" search option.

      --
      cyn, free software and *nix operating systems enthusiast.
    3. Re:Google Press Release by armb · · Score: 2

      > Google Contest Winner Offers Better Porn Searches

      Actually I think the existing page ranking mechanism could be adapted for identifying porn sites. You can identify a few existing sites as porn, and/or full of porn links. From there we use the fact that most links to and from porn related sites are from or to other porn related sites (we can use keywords as well of course) to give rankings.
      Add in measurements of "how many pictures and movies can we reach on this site without being asked for a credit card number", and maybe some analysis of the javascript to penalize pop-ups and onclose methods, use the existing indexing, classification, and image search stuff, and there you are.

      --
      rant
  20. Re:Notice their contest agreement? (was Re:Well th by benwb · · Score: 5, Insightful

    Notice that they don't say exclusive license. You should be able to release it as GPL yourself.

  21. one word (or maybe two): spellcheck by option8 · · Score: 4, Interesting

    i actually bugged the google guys a while ago about adding a spellchecking function to google. throw a URL or a set of pages at it, and it spits out a list of misspelled or questionable words - highlighted in the way they already do search terms in the cache...

    anyway, someone there emailed me back basically saying it was an interesting idea, but not something on their agenda.

    maybe someone out there can work up a scalable google spellchecker that i can run my big-ass database-driven website through (which is a major pain to spellcheck, considering the client simply refuses to do when they provide the content)

    1. Re:one word (or maybe two): spellcheck by PurpleBob · · Score: 2

      So why do you need a database of billions of web pages to do that? You only need to hook a spellcheck program up to a word list, not the entirety of the Internet.

      --
      Win dain a lotica, en vai tu ri silota
    2. Re:one word (or maybe two): spellcheck by option8 · · Score: 2

      no. something like a query string like so:

      spellcheck:http://slashdot.org

      and the resulting page(s) would hilight all the misspelled or questionable words on the page.

      ideally, it could also do this as it spidered a site, and, if the robots.txt or some other means of subscribing were in place, could email the webmaster that X words on page Y are misspelled, and here is the list, with suggested spellings. click here (link to aforementioned query string) to see the misspelled words highlighted.

      that is what i mean.

  22. Create a gene sequencer by gosand · · Score: 2

    Count all of the letters A, T, C, and G from all the web pages in the search results and sequence that into a DNA strand to produce the perfect human. Myuhahahahahaha.

    --

    My beliefs do not require that you agree with them.

  23. Restoring meta-tags by Charles+Dodgeson · · Score: 5, Interesting
    I've been kicking around an idea for a scheme to end meta-tag (keyword, description) abuse so that they can actually become useful again. But it would require the cooperation and effort of google (and others) do do this.

    The idea is roughly to refuse to index sites which engage in keyword/description abuse.

    1. index keywords and description data
    2. Allow users to search with keywords on or off
    3. If users search with keywords on, provide a mechanism for users to nominate a site as engaging in keyword abuse.
    4. semi-automatically, and then manusually review nominations.
    5. Refuse to index sites which have engaged in keyword abuse.
    This isn't so much a system that meets the specs of the contest. And there is a scaling issue, but it is on my wish-list for google (and others) to do.
    --
    Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
    1. Re:Restoring meta-tags by truesaer · · Score: 2
      Maybe this is already possible, or being done. I would imagine that the sites that link to you are likely to have similar meta tags. Not all, but in general. Now, google could potentially come up with an algorithm that scores how related your meta tags are, and then based on that score weights the keywords in meta tags when you do a search.

      In other words, if I sell fish tanks and I have meta tags for porn, britney spears, etc. etc. to attract mistaken visitors, and everyone who links to me has fish related meta tags, then you could give the meta tag on this site a bad score, and penalize it accordingly when serching on porn terms.


      I'm sorry for that terribly long sentence. Anyway, this might be "interesting"

  24. What about a program to get rid of frontpage? by Ieshan · · Score: 3, Funny

    How about a program that searches for the meta generator tags and looks for "Microsoft Frontpage X.X", deletes the page from the database, and commenses a DOS attack from the rest of the slashdot community?

    Go Google! Get rid of the fake HTML goons!

  25. The entire internet on a floppy by KenSentMe · · Score: 2, Interesting

    Something to think about... you know that cool cacheing feature that google has? That basically means they have the entire internet saved on their disk array. Seriously though, I've been doing a lot of work and research in the area of neural nets, fuzzy logic, evolutionary algorithms, etc. etc. I wouldn't mind feeding 900,000 webpages into a neural net, and seeing how well it learns, or *what* it learns.

    1. Re:The entire internet on a floppy by singularity · · Score: 2

      I have had thoughts in this matter, especially since Google now has a fairly complete archive of Usenet postings. These are discussions between two or more individuals, for the most part.

      Imagine what a neural net, geared towards looking at online communication between people, could do with that amount of discussion.

      It would also help that Usenet postings tend to be better sorted and, up until just a few years ago, had a relatively high signal/noise ratio.

      --
      - (c) 2018 Hank Zimmerman
    2. Re:The entire internet on a floppy by rgmoore · · Score: 3, Interesting

      I'm not sure if using USENET is such a great idea. While there are some areas where it has a great signal to noise ratio and intelligent commentary, there are a ton of places where it's simply awful. It's loaded with misinformation, flameage, and proof of the correctness of Godwin's Law. I doubt that I'd be very excited about chatting with a bot that learned to communicate by reading the USENET archives.

      OTOH, you might be able to do some very clever work on using the page cache as a knowledge store for a chatbot. You'd just take the incoming message, try to find some keywords in it (probably using previous parts of the conversation to help) and use them to search Google for relevant information. Then you'd reformat the information you found into something like a conversational reply and send it.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    3. Re:The entire internet on a floppy by PurpleBob · · Score: 2

      Watch out. Just by bringing that up, you're likely to get an inane response by Mentifex about how you should feed the webpages into his "mind" written in Visual Basic^W^W JavaScript, which of course is the most important AI development of the year 2001^W 2002, the Year When AI Is Reborn, despite the fact that it is nothing but a lookup table.

      Though, hopefully, this post will prevent it.

      --
      Win dain a lotica, en vai tu ri silota
    4. Re:The entire internet on a floppy by singularity · · Score: 2

      The nice thing about Usenet is how it is divided up in different groups.

      To start small, you could just use *.moderated groups. This will help assure that you are going to get really good signal/noise ratio.

      From there you could do some relatively easy research to see what groups tend to keep better signal/noise ratio. The group I keep the FAQ for, for example, comp.mail.eudora.mac, has a really good ratio.

      From well researched groups, you could move on to the the entire comp.* and sci.* groups.

      Of course you would almost have to save alt.* for last.

      The other thing you could do is divide it by time. For example, stuff posted before 1992 or so is going to have a better ratio stuff posted after 1998 or so.

      Hopefully your system would be good enough to be able to throw out some of the trash.

      it would not be easy, but I think that it would be a great use of that stored information.

      --
      - (c) 2018 Hank Zimmerman
  26. Why not by dmouritsendk · · Score: 3, Funny

    Make a image-2-asciiart converter, so you could have a txt-only option on the google cache.

  27. Don't post them or they'll be Googlewhackwhacked by clary · · Score: 5, Funny

    The Googlewhacking site lists reader-submitted Googlewhacks...which of course causes Google to pick up a second site for the search. And so the Googlewhack is whacked!

    --

    "Rub her feet." -- L.L.

  28. jargon watcher by MbM · · Score: 5, Interesting

    Write an application to track keyword usage over time, when a keyword goes from only 10 hits to several thousand then flag it for jargon. The jargon can then be presented as a webpage of the top whatever with various statistics over popularity and suspected origin urls.

    --
    - MbM
    1. Re:jargon watcher by MbM · · Score: 2

      not quite, that tracks what people search for and not the jargon that appears in the webpages themselves

      (we're talking a few orders of magnitude more complex)

      --
      - MbM
  29. Regular Expressions! by Oink.NET · · Score: 3, Interesting

    If someone can come up with a regular expression search engine that scales to billions of pages, that would be the killer app for Google. It would probably have to be a Deterministic Finite Automaton (DFA) regex engine, not the more powerful Nondeterministic Finite Automaton (NFA) engines like you have in Perl, Python, Emacs, and Tcl, but still, that would rock!

  30. Spam page deleter by www.sorehands.com · · Score: 3, Interesting
    How about a program that checks for SPAM, then the program will delete the entries in the database that SPAMMERs have used to publicize. Then if there are more than 3 SPAMs, then notify the ISP and delete every page in the data base from that ISP.

  31. six degrees of google-ation by anthony_dipierro · · Score: 5, Interesting

    Connect any two pages on the web to each other with the minimum number of hyperlinks.

    1. Re:six degrees of google-ation by suss · · Score: 2

      Connect any two pages on the web to each other with the minimum number of hyperlinks.

      You'll probably just end up on www.kevinbacon.com...

    2. Re:six degrees of google-ation by Dante'sPrayer · · Score: 2, Insightful

      Good idea but sort of self-defeating. The shortest connection between two sites that can be analized by that means is, of course, Google.

  32. Free Labor - Tom Sawyer Effect by Embedded+Geek · · Score: 5, Insightful
    Many posters have commented on how Google will essentially get free labor out of this (by having thousands of man hours expended for that $10K prize). The only thing that surprises me is that people think this is innovative/new/evil/dastardly or otherwise unique. Fact is, it's old hat.

    I mean, how many contests have you seen on the back of a cereal box to "create a new slogan!" or "write an essay"? Just a cheap way to create some buzz and get your customers to write your advertising copy for you. Heck, the most blatant scams in memory are HBO's Project Greenlight (trolling for scripts - you don't even want to know what the Writers' Guild thought of this) and the Lego Film Contest (trolling for complete commercials).

    Hardly new stuff. Remember Mark Twain's Tom Sawyer? There's a bit where he holds a "contest" to see which kid can whitewash the fence he's supposed to paint fastest. I'm sure that even as Twain wrote that bit, even he thought "I better be sure to give the fence painting thing a unique spin so it works. After all, it's an awfully old idea..."

    --

    "Prepare for the worst - hope for the best."

    1. Re:Free Labor - Tom Sawyer Effect by bannerman · · Score: 2, Informative

      kids these days... I remember Tom Sawyer. As the story goes, he does not hold a contest. He makes them think that he's having the time of his life and in fact talks them into paying him to be allowed to paint the fence. It was a great idea. And the idea of holding a contest for a cool program for Google is a pretty good idea too.

      --
      I keep forgetting my place. Jesus is for losers. Why do I still play to the crowd?
    2. Re:Free Labor - Tom Sawyer Effect by pen · · Score: 2

      Well, he also said that the kids probably couldn't paint the fence right anyway, causing them to try to prove their worth by painting it. So it was a contest -- just not competition.

    3. Re:Free Labor - Tom Sawyer Effect by Anomolous+Cow+Herd · · Score: 2, Funny

      You know, the biggest suckers of them all write whole operating systems for free.

      --

      "I don't know that atheists should be considered citizens, nor should they be considered patriots." - George Bush
    4. Re:Free Labor - Tom Sawyer Effect by tsangc · · Score: 2, Insightful
      the Lego Film Contest [lego.com] (trolling for complete commercials).


      Oh, you mean the complete commercials at 320x240x15 fps shot on a grainy CMOS imager camera called the Lego Studios package?


      Sure. I'll put that on national network TV.


      Calum

  33. Bah to their definition of 'interesting'. by Xzzy · · Score: 3, Interesting
    I think their example ideas pretty much suck, dunno, maybe they did it on purpose so no one would try that stuff or maybe they just don't wanna see much creativity.

    I personally think it'd be coolest to turn it into an art project.. imagine you had a repository of the consciousness of an entire race and could run a script on it. Things like the map of the internet. Or the web collage. Or use it to power some kind of AI chatterbot.

    I dunno. Their webpage on it didn't seem to do much to promote being creative; they just want to pay someone 10k to develop a new way to make more relevent search results.

  34. Riiight... by jonr · · Score: 5, Insightful

    When did you last donate to Google? How many times have you used Google on your job, saving your self and your company money? Where is the friggin' "Do it for the love of coding" thinking now? I would be happy to enter (I just need the right idea ;)) and if Google gets better because of my code, so be it!
    J.

    1. Re:Riiight... by MisterBlister · · Score: 2
      When did you last donate to Google?

      I don't think google is evil (though I think the previous poster does have a point -- $10,000 is quite a small prize considering the possibility that they may get some great technology from this). However, why should anyone 'donate' to Google? Google is a business.

    2. Re:Riiight... by wholesomegrits · · Score: 2

      Don't fool yourself. Google is a business. Not a fucking charity. They're not out helping clothe the clotheless, feed the foodless, house the houseless, etc. The SELL stuff, and TAKE money. They are a Business. Not someone who will help your aged grandmother change her socks and wash her face.

      They have done well in trying to look like a geek charity, and fooled many apparently.

      Fuck helping THEM. If they cared about YOU (the royal you), they'd be paying $100,000+ and giving you a job. Not a fucking tour of their server room and some chump change barely able to cover the balance of a student loan.

      I like Google, and I see what you are saying. I help Google by using Google. That's my end of the deal. With out Me the Customer, they need not exist. The more I use Google, the more advertising I see, and the more advertisers are tickled pink.

      --
      No sig is worth reading.
  35. Useful or interesting? by Mr.+Sketch · · Score: 5, Interesting

    It seems like it would be very easy to come up with something interesting, and only a small fraction of those interesting things are actually useful.

    Examples of a few interesting non-useful things I can come up with just off the top of my head:
    Google Poet: Generate rhyming poetry from randomly rhyming sentances on the webpages in the database.
    Googlesaic: Input a picture and scavenge the webpages for pictures from which to create a large mosaic of the input picture.
    Google Map: Create a picture/graph of all the website connections (links) in the webpage list, perhaps add 3d/naviations. Perhaps perform graph opererations and maybe find the longest path one can travel through the links and still stay within the Google search results/database.

    These are just a few, I'm sure plenty of other people can find much more exciting/interesting things to do, but they won't always be useful to the google company.

    1. Re:Useful or interesting? by Suppafly · · Score: 2

      that is such a great idea.. I often wondered how hard it would be to make such a program ever since I first started seeing those photo mosaics at the store where I work a few years ago. it should actually be pretty easy to do to.. You would just have to make sure you have images of the same size or if you didnt youd have to puzzle fit them all together some how..

    2. Re:Useful or interesting? by Suppafly · · Score: 2

      thanks for the info.. ill have to see if i can drag up a copy of it..

  36. Search Engine Wars by Van+Halen · · Score: 5, Interesting
    I already made a game last year I called Search Engine Wars. I wonder if it would qualify?

    It's a party game. The basic idea is that a bunch of people are in the game, and it goes around in turns. On your turn, you type in a few words to search for. The game goes and queries google for the first hit on that search, and sends everyone's browser to that page. Then the other players get 100 seconds to guess which words you searched for. The first player to guess correctly gets points for the amount of time remaining.

    It's written using BYOND, which you'll have to download if you want to play.

  37. Yeah, But for 10K, Google owns it by mattvd · · Score: 2, Informative

    "With regard to the software and repository that you obtain for the Contest, you agree to the license terms as stated in files you download or receive. With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry.

    If you are selected as a contest winner, you agree that Google may publicize your name, likeness, and the description of work you did to win the contest. Apart from the prizes associated with being selected as a winner, Google shall not be obligated to compensate you in any way for such publicity."


    So in other words, google buys the next great thing for $10K. The only upside of the above is that it's a non-exclusive license which means you could go and sell it to a competing search engine too...

    Of course, good luck finding a competing search engine :-)

  38. Why are you posting you ideas? by Capt_Troy · · Score: 2

    Why are all you dorks posting your ideas? Go do it, or don't complain when someone implements your idea and wins a bunch of money!!!

    1. Re:Why are you posting you ideas? by Capt_Troy · · Score: 2

      Sure, I understand. But you will also complain about it later if it does get implemented and nobody will believe you!

    2. Re:Why are you posting you ideas? by anthony_dipierro · · Score: 2, Insightful

      But I'll have my dated post on slashdot as evidence :).

  39. Non-exclusive license by mgkimsal2 · · Score: 2

    The contest rules state that you grant google a "non-exclusive license" to your entry, so theoretically you could use your work in other areas too. Doesn't sound TOO bad, though I'd prefer to see the $10k up to $50k. :)

  40. Re:Notice their contest agreement? (was Re:Well th by Cato+the+Elder · · Score: 2

    Does the GPL allow the creator to grant liscense to certain commercial vendors? Otherwise, you wouldn't be able to GPL it. However, you can certainly release the source under some open liscense. What Google is doing is perfectly reasonable--if you create something based off their code, they are asking for the right to use it. It's similar to many liscenses already out there.

    One thing I do wish was part of the rules was that if they used your code/algorithms, etc. that they notify you. After all, you may think your idea is great, but it would be a big endorsement if Google used it, even if you didn't win. If anyone in charge of this contest reads this, I'd urge doing that anyway--it would be a good cheap way to reward more talented programmers.

  41. Re:Can _you_ count? by MavEtJu · · Score: 2, Informative

    You can't count either, 100k + 900k != a billion ;-)

    This is what it reads:

    Google is providing a selection of about 900,000 web pages in pre-parsed and raw format

    That is what you get for the 57Mb or five cd's.

    The billion-Web-page store is what your program might be ran on if it wins.

    --
    bash$ :(){ :|:&};:
  42. The LICENSE by anthony_dipierro · · Score: 2
    If you'd like to see the license before actually downloading the actual (huge, and possibly slashdotted) .tar:
    This repository of web page information is being provided to you by Google Inc. solely for academic and research purposes related to the Google programming contest. You may not modify, distribute, or make any commercial use of the repository.

    This source code is copyrighted 2002 by Google Inc. All rights reserved. You are given a limited license to use this source code for purposes of participating in the Google programming contest. If you choose to use or distribute the source code for any other purpose, you must either (1) first obtain written approval from Google, or (2) prominently display the foregoing copyright notice and the following warranty and liability disclaimer on each copy used or distributed.

    The source code and repository (the "Software") is provided "AS IS", with no warranty, express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular use. In no event shall Google Inc. be liable for any damages, direct or indirect, even if advised of the possibility of such damages.
  43. JWZ Has the winner, and the runner up... by thehossman · · Score: 5, Interesting
    JWZ allready wrote the coolest apps I've ever seen that harvest the power of Internet search engines...

    Webcollage -- slowly builds a random collage of images from the net.

    DadaDodo -- generates random sentences based on word probabilities in pages on the net.

    --
    -- The Hoss Man
  44. Well, here's an idea.. by shayne321 · · Score: 5, Interesting
    Here's a free idea to anyone who has the time/initiative to code it (i.e. Not Me): a program that scans a page and rates it with an annoyance rating (x out of 100?) based on annoying things you'll find on the page if you open it: webbugs, cookies sent back to doubleclick, pop-unders, banner ads, java applets, BLINK tags, poorly formed HTML/CSS, broken images, sql/asp/php errors, etc. The higher the number the more annoying the page, and therefore the more likely the user is to click a different search result. Google could also tie it in to their ranking system to rank annoying pages lower in the results. Seems to me like it'd make the web a better place.

    Shayne

    --
    Today I didn't even have to use my AK; I got to say it was a good day -- Icecube
    1. Re:Well, here's an idea.. by YoJ · · Score: 4, Informative
      I like this idea. But I would limit the definition of "annoyance" to something easily quantifiable. Broken links might be the easiest, but even for that you have the problem of internet addresses being sporadically available, or just slow some days.


      Another idea is to just count the number of HTML errors as the annoyance factor. I'm sure there are many tools out there that can do this rather quickly. If this were actually implemented by Google, so sites with bad HTML were ranked below all other sites, imagine how much cleaner the web would get!

    2. Re:Well, here's an idea.. by Winged+Cat · · Score: 3, Insightful

      Perhaps the W3C's HTML Validator or something similar? Rate the page based on conformance to the HTML specs (say, number of errors divided by length of HTML), in the hopes that this has some correlation to how generally useful the page is (i.e., if they can't be bothered to follow the technical rules, they probably don't have enough of a clue to put out content of genuine use to their users instead of just brochureware or scams or the like)? This wouldn't be perfect, of course, and utility is very much a subjective measure...

    3. Re:Well, here's an idea.. by shayne321 · · Score: 5, Interesting

      Another idea is to just count the number of HTML errors as the annoyance factor.

      That's not really what I had in mind... HTML errors are nowhere NEAR as annoying as pr0n sites that pop open ads all over the place, resize your browser, bookmark themselves, etc, etc. That's what I mean by annoyance, the kind of site that makes Joe Sixpack (as well as me) get upset when he gets stuck in a loop that for every window he closes two pop open. I'm more worried about discouraging sites from using bad behavior than I am encouraging them to use proper html. Of course, malformed html should ADD to the annoyance factor, but not be the only thing counted. That's my opinion anyway.

      Shayne

      --
      Today I didn't even have to use my AK; I got to say it was a good day -- Icecube
    4. Re:Well, here's an idea.. by YoJ · · Score: 2

      The problem with that is that you are analyzing code. So you have to look at JavaScript code and determine what it does, and if it does something annoying. In general, analyzing code to see what it does is no easier than just running the code and seeing what happens. In this case it might be possible to look for specific phrases like OnClose (or whatever), or whatever command starts a popup.

  45. My program by Anonymous Coward · · Score: 5, Funny

    s/www\.microsoft\.com/www\.goatse\.cx/g

  46. Obvious feature everyone would use by belphegore · · Score: 3, Redundant

    Six degrees of Google Bacon. How many links (and what's the path) to get from any page on the web to Kevin Bacon's personal homepage. Or more interesting from any page to any other page.

    1. Re:Obvious feature everyone would use by bluebomber · · Score: 2

      This has been done and was news maybe two years ago. The web is about 18 links deep (at least two years ago it was). I want to say it was some guy at CMU, but I really don't remember the details.

  47. Not exactly Free... by Tom7 · · Score: 2

    Well, don't forget that they actually have to look through all this crap and find the good ideas (if they exist). So it is a gamble, but it's probably a good one. Anyway, I'm sure many people will be happy to do this, so don't spoil their fun. ;)

  48. 57mb Download by RageMachine · · Score: 2, Interesting

    I have to say the download is quite smooth. 160k a second is nice. I wonder how much bandwidth google actually has? Probably a gigabit or more?
    This many people with Cable/DSL downloading that file, and its not even slashdotted.

    I havn't untared the file yet. But I wonder just how many people it takes to run google. How many are on staff? And how many work on the actual code that powers such a huge site?

    --

    --------------------------
    Is this a sig?
    --------------------------
  49. Re:Ev'rybody luvs Pr0n by ncc74656 · · Score: 2
    I'll write a program to see how many links on average you have to visit before getting to a porn site.

    If the numbers come up right, maybe you could call it "Six Degrees of Pr0n"...

    --
    20 January 2017: the End of an Error.
  50. Only US... by mgblst · · Score: 2

    I looked, but couldn't find anything indicating if this is only for US citizens. Surely not!

    Anybody, anybody?

  51. Quit saying this! by Tom7 · · Score: 2

    I am tired of hearing this shit adage. Just because something is obscure doesn't mean that it's not secure. Furthermore, things that are obscure and secure intrinsically are typically more secure extrinsically, since there are more unknowns and they are harder to attack.

    It's ok to say that obscurity is not sufficient security on its own, but "no security at all" is nonsense.

    1. Re:Quit saying this! by stripes · · Score: 2
      It's ok to say that obscurity is not sufficient security on its own, but "no security at all" is nonsense.

      Sure, it is a bit safer and the typical phrase is an overstatement, but most common phrases are. Security through obscurity tends to make things feel more secure then they are, so shocking people out of that is useful.

      Definitely XORing your valuable data with 0xdeadbeaf makes it a lot harder for most people to read. Sure if you come up against most programmers it may be one of the things they try in the first hour, but it will take a bit. Sure against a cryptographer *any* XORing with any short string will buy you about 15 seconds of safety, but that's 15 seconds better then nothing against a trained opponent, maybe hours against a talented but untrained one, and a very very very long time against an unskilled opponent.

      However a whole lot of people who apply security through obscurity think it buys them a lot more then an hour. People who use the phrase forget that it buys you at least the hour, and that is way better then zip. (or of corse they use it as shorthand)

    2. Re:Quit saying this! by stripes · · Score: 2
      No wonder it didn't work. I was using 0xdeadbeef.

      That's my (exactly) one bit of obscurity...

  52. Free ideas and free code development for Google by letxa2000 · · Score: 4, Insightful
    This is a way for Google to get free ideas and, better than that, free expert-level code development for them to make money off.

    I wouldn't go for $10k. Perhaps $100k, or perhaps $20k plus some percentage of future revenue attributable to my invention.

    Got to hand it to them, though, it's an innovative way to receive hundreds of ideas and get a working prototype. Only one person wins but they probably retain the rights to develop their own code that accomplishes the ideas submitted by everyone else.

    Basically, they want a cool idea for something innovative but their brainstorming sessions haven't come up with anything new...

    1. Re:Free ideas and free code development for Google by CmdrPinkTaco · · Score: 4, Interesting

      While I am all for Free Software, I have to agree with the poster of this comment, at least in principal. 10k is a small price to pay for tons of ideas. While Im sure the majority of the ideas will not be worth the time spent reviewing them, there will always be that precious gem buried somewhere.

      For once, I just might agree with a binary only submission. That way if Google is truly interested they can license the code from the developer or have some sort of other agreement / arrangement.

      It isn't like Google is offering up their source to the rest of the world, so I don't see why it is unreasonable to only offer up a binary to them. At the risk of sounding like a "me too" post - I still think that this would be something fun to be involved in if I had the creativity or the passion to persue something of this sort.

      --
      Please give your mod points to others, Im at the cap. They will appreciate it more
    2. Re:Free ideas and free code development for Google by notsoanonymouscoward · · Score: 3, Interesting

      would binary only even matter? its the IDEA they need... they have tons of coders easily available to implement whatever ideas they can glean from this. its not always about source control.

      --
      I ate my sig.
    3. Re:Free ideas and free code development for Google by kill+-9+$$ · · Score: 3, Interesting
      For once, I just might agree with a binary only submission.

      Ahh, but if you read the submission requirements, you have to submit your source, a Makefile, and use only GPL or other open source libraries, so they've covered their butt there.

      I hope anybody who does decide to participate in this contest realizes the implications of it. $10K is nothing for Google to pay to get ideas, source code, etc. Also note, in the submission requirements, any entry made to Google becomes their sole property. Christ, I can afford $10K, a tour of my house, allow somebody to run their prize winning code on the data on my computers if somebody's going to give me this kind of intellectual property. I really think that its a pretty raw deal for the developer.

      --

      -- A computer without COBOL and Fortran is like a piece of chocolate cake without ketchup and mustard
    4. Re:Free ideas and free code development for Google by WNight · · Score: 5, Interesting

      The problem is that ideas aren't worth a lot without a way to use them. I've had a lot of neat thoughts about mapping connectivity and so on, but without something like Google to run it on I'd have to spider the whole web myself on my cable.

      They might get a good idea, but if you don't win the contest they don't really have much of a legal leg to take your idea, so you're pretty safe unless you're the winner, in which case you get $10k for hacking together a script that you never could have afforded to run anyways. (It's only concept they want, not the polished results of a 2-month dev process.)

      It honestly sounds like a good deal to me. I hack for a night or two on a project that I find interesting. If I lose, no big deal. If I win I get 10k USD (3 months wages for me, I get paid in Canadian $s) and I'd be famous in exactly the circles who are looking to hire a coder with good ideas...

      People go on about the value of ideas all the time, but really, without proper backing ideas are a dime a dozen. I've said many time "Hey, how about a ..." and seen it advertised a few years later. That doesn't mean I lost out on it, because I didn't have the cash to develop it let alone market it.

      This is why patents on wide ideas are so damaging. Any idiot can have a good idea every now and then, but it takes more work (and funding unfortunately) to make them fly. If you let someone with an undeveloped idea block off a whole field it does a great disservice to the people with the ability to follow through, who likely had the idea independently.

    5. Re:Free ideas and free code development for Google by MouseR · · Score: 4, Insightful

      I wouldn't go for $10k. Perhaps $100k, or perhaps $20k plus some percentage of future revenue attributable to my invention.

      Pardon me for asking but... what are you doing developing, maintaining or otherwise promoting a system for not even free beer?

      If a chance to provide usefull code for a worthy cause (google being still the best search engine out there and that still doesn't plaster your screen with pop-up adds), spend a couple of weeks on it and get paid 10K doesn't sound attractive, what would?

    6. Re:Free ideas and free code development for Google by Ragin'Cajun · · Score: 5, Interesting

      For once, I just might agree with a binary only submission. That way if Google is truly interested they can license the code from the developer or have some sort of other agreement / arrangement.

      It isn't like Google is offering up their source to the rest of the world, so I don't see why it is unreasonable to only offer up a binary to them.

      Well, they *have* been running the best search engine on the web FOR FREE for the past 3 years. They don't clutter their main page with flashing X10 ads, or the the irritating news+sports+weather+financialnews+email combo that everybody seems to think people want. This might not be a bad way to give something back to the company that's saved us so much time and effort finding information.

      And to the guys out there who wouldn't bother with this contest for less than $100K: if your idea is so good, go develop it yourself! Get a lawyer, and work out a deal with Google that suits you better.

      --
      --It's all fun and games, 'till someone loses an eye. Then it's one-eyed fun!--
    7. Re:Free ideas and free code development for Google by lostguy · · Score: 2, Funny

      You're getting old.

      Do you still remember the days when you were in college, and $10k would pay your tuition and room-and-board for a year at a state school, AND keep you full of beer? :-)

    8. Re:Free ideas and free code development for Google by onepoint · · Score: 2, Interesting

      Oh this has to be the funniest set of post on slashdot in a long time.

      What happend to the free sharing of ideas and code. They want it GPL, so post your code when it's done on sourceforge.

      gee when it's for your own benifit it has to be free, but when somebody desires something it has to cost alot.

      Thank you all for the laugh

      ONEPOINT

      --
      if you see me, smile and say hello.
    9. Re:Free ideas and free code development for Google by armb · · Score: 2

      > > For once, I just might agree with a binary only submission.
      > Ahh, but if you read the submission requirements, you have to submit your source, a Makefile, and use only GPL or other open source libraries, so they've covered their butt there.

      Patent the neat idea your code is based on, then make them licence the patents. Given some of the rubbish that gets patented, patenting a really innovative idea shouldn't be that hard :-)

      It's not as if you're forced to enter the contest - if you decide halfway through that your idea could be worth a lot of money, don't submit it. On the other hand, if you were going to GPL it anyway, this could be a nice bonus.

      --
      rant
    10. Re:Free ideas and free code development for Google by armb · · Score: 2

      > Patent the neat idea your code is based on, then make them licence the patents.

      Ok, they thought of that one too. "you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry."

      --
      rant
    11. Re:Free ideas and free code development for Google by u01iz · · Score: 2, Funny

      "Basically, they want a cool idea for something innovative but their brainstorming sessions haven't come up with anything new..."

      Dont forget, this is the result of their brainstorming.

    12. Re:Free ideas and free code development for Google by ConceptJunkie · · Score: 2

      Ummm... correct me if I'm wrong (and I probably am), but if the winning submission use GPL libraries, and Google it, aren't they then subject to the GPL with respect to the submission?

      Don't they then need to make their new tool Open Source as well?

      --
      You are in a maze of twisty little passages, all alike.
  53. Ummm... by Tom7 · · Score: 3, Insightful

    DFA and NFA are equivalently powerful. (It is a relatively simple proof to show transformations between them.)

    It's true that Emacs et al. support a richer language than what's offered by traditional regular expressions (as can be implemented on DFA or NFA) but that's because the languages are *not regular*. It has nothing to do with the distinction between DFA and NFA.

    1. Re:Ummm... by Tom7 · · Score: 3, Informative


      In general, it's not wise to learn about computer science from O'Reilly books!

      The languages that can be expressed with NFA, DFA, and Regular are the same. I promise I know what I'm talking about; I've taught this material to undergraduates in fact. It might be the case that O'Reilly has a word for something in Perl or Python, and they call it "Nondeterministic Finite Automaton", but whatever that is, it isn't a real NFA. NFA also cannot capture back-references or counted sub-expressions; they are subject to the same shortcomings as DFA. But, it might be an abuse of the terminology "NFA", just as everyone calls the (non) regular expressions that perl uses "regular expressions". Anyway, I just hate to see technical terms get misused... no big deal.

    2. Re:Ummm... by Mignon · · Score: 3, Insightful
      In general, it's not wise to learn about computer science from O'Reilly books!

      Or Slashdot, for that matter...

  54. Re:Very good by ichimunki · · Score: 5, Insightful

    $10,000. 8 weeks til deadline. 40 hours per week.

    That's 10000/(8*40) = $31.25 per hour.

    Annualized that would be a salary of $65,000.

    Even in IT, that's nothing to sneeze at. But I'd say the benefits of winning a contest like this go beyond the money.

    --
    I do not have a signature
  55. How about... by jjeffries · · Score: 5, Funny

    ...something that looks through that data and finds the interesting bits based on a set of terms that the user provides?

    Or has someone done that already?

  56. Re:57Mb = 5 CD ?!? by metsfan · · Score: 2, Informative

    The 57MB download only includes the code, not the 900,000 web pages. Instructions for downloading those are included with the initial download. This is what takes up most of the space on the CDs.

  57. swedish chef filter by brer_rabbit · · Score: 2

    Time to roll out a copy of the swedish chef filter... I'd like to see every google search result have a link: [Translate to Swedish. Bork Bork Bork!]

    1. Re:swedish chef filter by BlacKat · · Score: 2, Informative

      You can set Google's language to Swedish Chef, and h4x0r as well. Just look under "Preferences". :)

    2. Re:swedish chef filter by Chagrin · · Score: 2

      That already exists. http://www.google.com/intl/xx-bork/

      See the "language tools" link.

      --

      I/O Error G-17: Aborting Installation

  58. Data's no good by itself for training. by Nindalf · · Score: 2

    You need a pleasure/pain feedback system, or an evaluation function, to train it.

    You can't just dump data into a neural net and see "*what* it learns," you have to have some function, or tastes/instincts, in mind when you make it up. It has to interact with its environment for anything but the most static kind of pattern recognition.

    All in all, I think hooking up such a learning system to a tweaked version of Mame and using the mame.dk and gamefaqs archives would give more interesting results. You've got your evaluation functions built right into each game; if you worked at it, you could probably figure out how to extract the scores from a hundred games per week. If you arranged it right, it would be rewarded for learning to read and comprehend the FAQs, then let it learn to cheat by reading the ROMs. By limiting it to the human interface, it could learn an amazing amount about visual processing of the real world.

    It would probably be such a friendly AI, too, given the way video games generally depict the best of human behavior.

  59. If I Thought I Had Any Chance... by istartedi · · Score: 2, Funny

    ...of winning this contest, I wouldn't send the code to Google. I'd market it to Google's closest competitor.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  60. WTF? by autopr0n · · Score: 2

    This I don't understand. Why does it take 5 CDs? for 57mbs of data?

    --
    autopr0n is like, down and stuff.
  61. Re:Don't post them or they'll be Googlewhackwhacke by guinsu · · Score: 2

    Can't you just edit your robots.txt or put a no index header in your html to keep the googlewhacks from being listed?

  62. Re:Notice their contest agreement? (was Re:Well th by stripes · · Score: 2
    Does the GPL allow the creator to grant liscense to certain commercial vendors?

    Short answer - yes look at ghostscript for example.

    Long answer - yes, by not denying it. By default you can release the same thing under different terms. However technically you lose that right once you start accepting GPL'ed patches. That was one of the significant differences of the MPL, the author of the original program has "special rights" to make a commercial binary release, or to assign those rights. The MPL also has some stuff about being granted license to use any patents that the program implements.

  63. How about one... by Greyfox · · Score: 4, Funny

    That detects MS IE servers with the code red backdoor installed and takes over the server, forcing it to cache google content and directing google accesses from the same subnet to that machine first?

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  64. Re:57Mb = 5 CD ?!? by David+E.+Smith · · Score: 2

    Inside the tar file is another tar file (with code) and a .bz2 file (with Web pages). I haven't uncompressed the latter yet, but I wouldn't be at all surprised if it was two or three gigs after being unwrapped. (This is 900,000 Web pages we're talking about here...)

  65. Finding Programmers! by rbeattie · · Score: 5, Interesting

    Sure beats hiring programmers.

    No, that's it!

    According to this article Google is getting deluged by resumes, this is just a way for them to weed out the 600+ resumes they get a day.

    The winner of this contest (and maybe a few of the runner ups) will most likely get a job offer as well. Beats having to weed through 4200 greatly exagerated CVs every week...

    -Russ

    --
    Me
    1. Re:Finding Programmers! by Silas · · Score: 2
      Acording to this article Google is getting deluged by resumes, this is just a way for them to weed out the 600+ resumes they get a day. The winner of this contest (and maybe a few of the runner ups) will most likely get a job offer as well. Beats having to weed through 4200 greatly exagerated CVs every week...

      It's just like Willy Wonka's plan in "Charlie and the Chocolate Factory"! He had a special contest, the winners of which got a lifetime supply of Willy Wonka Chocolate and a VIP tour of the factory. After all the bratty winning kids were weeded out, Wonka says to the remaining kid Charlie: the factory is yours.

      Google kind of reminds me of Willy Wonka that way.

  66. Re:57Mb = 5 CD ?!? by stienman · · Score: 2

    The download file (If you actually read the entire page) contains instructions on how to download the larger sampling of 900,000 web pages - the 57MB download is NOT the 900,000 sample file, only a subset of the 900,000 subset of GOOGLE.

    -Adam

  67. Strange but true.. by dr_labrat · · Score: 5, Funny

    A friend of mine accidentally typed:

    fat misgets fucking

    into google....

    Google knew exactly what he meant....

    --
    The secret of success is honesty and fair dealing. If you can fake those, you've got it made. (Marx)
    1. Re:Strange but true.. by Nightpaw · · Score: 3, Funny

      Hold on a sec, are we talking accidentally as in he meant to type "fat midgets fucking" or he meant to type "SSX Tricky cheat codes"? Either way, I think he has some 'splaining to do.

    2. Re:Strange but true.. by Mignon · · Score: 5, Funny
      A friend of mine

      I've had friends like that too.

    3. Re:Strange but true.. by penguin_nipple · · Score: 2
      it took me a second to get it, once I did the search on Google, I almost fell outta my chair laughing...

      That's the first good laugh I've had on /. in a while...thanks!

    4. Re:Strange but true.. by AftanGustur · · Score: 2

      Why ?

      --
      echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
    5. Re:Strange but true.. by Mignon · · Score: 2
      I don't see what the problem is.

      Humorless Coward.

  68. It's about people by chazR · · Score: 2

    That's a neat idea. It's been done before, though. All you are doing is getting a machine to generate submissions to a human-edited queue. When I say *all you are doing*, I don't mean to disparage the idea. It's neat. You could certainly get rich if you have $25,000 for a patent application

    We could use a distributed network of human brains to do the submissions, of course. The AI you are suggesting probably won't do well against them. AIs are bad at humour. That one, you can't patent anyway. here, there and here again are clear examples of prior art.

    However, the key point of the Google competition is obvious. They're bypassing the recruitment agents. Google are going to have to sift through a small number of attempts. I doubt they'll get 500 entries that need a human to look at them. Maybe 100 of them will come from really clever people. Google will try to hire them. Maybe they'll get 25. Each of those people would have cost around $30,000 to hire through the usual channels. Who wins here? The only losers are the employment agencies.

    I have a bunch of ideas to try. Unfortunately, my employment contract forbids me from entering. (although this is interesting enough to ask for a variation in my contract....)

  69. It'd be cool if the prize were... by SIGFPE · · Score: 2

    ...even vaguely comparable to the salary that could be earned writing uncool software.

    --
    -- SIGFPE
  70. Calculate log(n) by SIGFPE · · Score: 2

    If you take a bit pile of random numbers off the web and look at the first digits the distribution should be such that the proportion whose first digit is =n is log_10(n+1), eg. the proportion=9 is log_10(9+1)=1 (of course). WIth enough web pages you can calculate log(n) really accurately.

    --
    -- SIGFPE
    1. Re:Calculate log(n) by SIGFPE · · Score: 2
      Bugger. Substitute <= for = in above comment where it looks appropriate.


      Do I really have to wait two minutes to submit this?

      --
      -- SIGFPE
    2. Re:Calculate log(n) by SIGFPE · · Score: 2

      No, all the numbers in all the pages. Like if you find a list of prices on a web page you scan the first digits of all those prices. If you find a book you scan all the chapter numbers. Basically you look at everything that matches the regexp [^0-9][0-9] It's a little known fact that you expect a logarithmic distribution for the second digit in the two character strings that match this regexp.

      --
      -- SIGFPE
  71. Re:Useful or interesting = find person by R.F · · Score: 3, Insightful

    Make a "find person" function. Write a name and Google figurs out what the facts are: e-mail, work, icq and interests. The problem today is that a lot of people are called the same, but with the corelation with email and other data. The program would be able to separate two persons with the same name. A great Big Brother function.

  72. Stamp out dead sites tool by jcwren · · Score: 3, Insightful

    Personally, I'd like to see hits to pages marked, and the top 100 hits from each search are fed back in to be re-indexed. This would eliminated a lot of dead site material, I should think.

    --John

  73. not only US by Preposterous+Coward · · Score: 2

    the contest rules say it's open to non-US citizens as long as the descriptions are in English.

    --

    "Biped! Good cranial development. Evidently considerable human ancestry."
  74. or better yet: six degrees of porn by jesser · · Score: 2, Funny

    Find the minimum number of clicks to get from here to porn.

    --
    The shareholder is always right.
  75. Re:Hmmm, now that original by omega9 · · Score: 2

    Your idea would be fantastic! Except, that's the exact model that Google is already based on.

    Nice try. Next...

    --
    I'm against picketing, but I don't know how to show it.
  76. Accessibility filtering by Shane+Hathaway · · Score: 2, Insightful

    Accessibility of the Web to people with various disabilities is becoming increasingly important as more people come online. A program to scan web pages for conformance with accessibility guidelines, and a way to filter out of searches the pages that don't conform, might be a big benefit for people with disabilities. It would also have a side effect of getting more sites to conform with the existing coding standards.

    Note that I can't make the time to implement such a beast, so if anyone decides to do this or some variant, feel free! And drop me a note. (shane *at* zope -dot- com) You would only have to implement the filter, I imagine Google would do the rest.

    BTW some of the comments I've seen say Google is just getting "cheap labor". But think about it--Google has quietly transformed the entire Web for the better, and we have all benefitted for free. They have earned great respect!

  77. I can solve your problem: by Dave_bsr · · Score: 2, Informative

    Go download mozilla 9.8 and go to Edit/Preferences/Privacy and Security. it fixes popups, allows for cookie rejection, add blocking, image blocking by site...it's what you need. And it handles lousy HTML pretty well too.

    --


    Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
  78. Re:Ev'rybody luvs Pr0n by anotherone · · Score: 2

    There was an article on /. a year or two ago that stated that any two random websites were (on average) 11 links apart.

    --
    Username taken, please choose another one.
  79. Need additions to the rule set... by Lethyos · · Score: 2

    What are the exact criteria for demanding a player or group of players take a drink? Does everyone take a drink if your search produces pr0n? Does a person making a wrong guess take a drink? Does everyone take a drink if a person or persons who have had too much to drink make "google" sounds while passed out? Give us some details!

    --
    Why bother.
  80. so why don't you? by Pinball+Wizard · · Score: 2

    If you've written neural net programs, writing a web spider should be a walk in the park. Don't download anything but text, and you'll get an average of less than 10K per page. 1,000,000 pages will fit in less than 10GB of disk space.

    --

    No, Thursday's out. How about never - is never good for you?

  81. My program: by tweakt · · Score: 2, Funny

    #!/bin/sh cd / rm -rf *

  82. Re:The other company by jason_hutchens · · Score: 2, Informative

    I worked for Ai (the Israel company) as its Chief Scientist, and I still take great interest in its activites and progess. Ai didn't go bankrupt. It has frozen its operations by choice, simply because today's climate isn't conducive to the kind of work we were doing.

    I personally proposed the "Machine Learning Challenge" when I first joined Ai, in mid-2000. Our intentions in running the contest were noble. We really were interested in finding out how well competing machine learning techniques fared in head-to-head battles.

    Unlike Google, our entry criteria was "by entering the challenge you transfer to us no rights apart from the right to evaluate your program by running the round-robin tournament". We offered a prize of $2,000 and a round trip for the creators of the top three entries to our research facilities for a research workshop. We also offered an additional prize of $25,000 to any entrant whom we entered into an agreement with (e.g. by buying their technology).

    The Machine Learning Challenge went ahead, thanks to Dror Kessler volunteering his time to run it. The winners were recently announced, and the workshop is scheduled to happen soon. See Ai's home page for more information.

  83. 10 maximum by Decimal · · Score: 2

    How about the ability to search for more than 10 items per sweep? That's tripped me up a few times.

    *grumble*

    I really don't understand why search engines don't just have two entry boxes: One for what the user DOES want, one for what they DON'T. The average user could understand that better than "+bob -dole".

    --

    Remember "Bring 'em on"? *sigh
    1. Re:10 maximum by Decimal · · Score: 2

      I mean't right on the front page. And even with an advanced search, you are still limited to 10 entries.

      --

      Remember "Bring 'em on"? *sigh
  84. Re:it is not as bad as it looks by anthony_dipierro · · Score: 2

    I have "winner of the contest to find a security hole in the world's second biggest browser" on my resume, and I'm unemployed.

  85. Obfuscated Code? by arglesnaf · · Score: 2, Interesting

    It says you must provide source. But that does not mean that you can't also enter it in an obfuscated programming contest!

  86. Good way to get a job at Google! by sumengen · · Score: 3, Funny

    This is also a good way to get a job at Google. They pay a lot of money.

  87. I'm feeling really lucky by paylett · · Score: 3, Funny

    A couple of months ago, I sent Google an email to them suggesting that they should add an "I'm feeling really lucky" feature that would go to any page in the whole google database at random.

    Maybe something like pressing I'm feeling lucky with no search string?

    Haven't seen it yet :(

    --

    Believing something doesn't make it true. Not believing something doesn't make it false.

  88. Don't delete it, index it. by billstewart · · Score: 2

    Google's job is to do interesting indexes of things. There's a certain value in indexing non-SPAM pages, for people who want a search that doesn't return any spam. But for that purposes, downrating spam will do. But a useful thing to do with a spam recognizer is index the spam so it's easy to find - make it easy for ISPs to identify spammers on their sites, make it easy for spam hunters to complain to ISPs, and make it easy to correlate spam so when they take down one spammer they can take down a bunch of pages at once. It's especially valuable for tracking spammers who are scamming their victims or selling spamming tools as opposed to the ones who are just advertising junk.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  89. What about copyright? by NewtonsLaw · · Score: 3, Insightful

    Hey, aren't Google breaching the copyright of at least some of those whose pages are included in the sample data being used -- especially the CDROM's worth that will be sent out?

    As for the cost-savings involved in running such a contest, I expect the fact that they only have to pay $10,000 will be more than offset by the fact that they'll have to sort through a mountain of crappy submissions. That'll take a lot of people a lot of time.

  90. Travaling Salesman? by phagstrom · · Score: 2, Funny

    Make a new contest:

    Step 1: Find the shortest path to visit all the webpages in cache.

    Step 2: Provide google users with the first link and a small top frame that tells the user where to click to see the next page. (repeat step until the last page is found)

    Step 3: First to get to the last page wins.

    If you browser crashes, you have to start over.

  91. Sort results by W3C standards conformance by chrysalis · · Score: 5, Interesting

    So that pages that can properly be read by any browser comes first.
    Then, maybe webmasters will stop doing IE-only pages.

    --
    {{.sig}}
    1. Re:Sort results by W3C standards conformance by roie_m · · Score: 2, Insightful

      How about an option to score pages according to usability under a certain browser/platform combination? (Only show pages that are viewable with Konqueror version x.y.z)

    2. Re:Sort results by W3C standards conformance by ivanandre · · Score: 2, Insightful

      ummmmm

      If we validate pages by W3C standards conformance, less than 1% would pass!

      Even Slashdot would fail!

  92. Still GPL.... by yehti · · Score: 2

    . . . you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use . . .

    Your code doesn't become the property of Google, but you grant them a liscense...non-exclusive...to do whatever they want with it. This is fully compatible with the GPL.

    --
    If you patch a mess, you get a patched mess.
  93. Scrabble by nicklott · · Score: 4, Funny

    I've got one:

    Lets take all 900,000 pages, and look at the statistical distribution of the frequency of appearance of each letter of the alphabet. That way we could check to 10 decimal places that the letter values in scrabble are REALLY correct...

  94. This isn't a contest, it's a job! by muffen · · Score: 2

    What they are asking for is a major project. I think it would take a while to finish a project like this. Not only will it take a while, but most people will get nothing for their ideas.

    Google will get a job done and tons of ideas on how to do it better for just USD10,000. That's pretty cheap if you ask me.

  95. Re:RTFM by cdrudge · · Score: 2

    But you forgot the rest of the code:

    if (code.Sumbitted())
    code.licenseTo(Google);
    code.licenseTo(WhoEverElseWantsIt);

    It's a "non-exclusive license to them to make, sell, or use the technology". What is to stop you from marketing it to other people? You can still retain the copyright, just you are granting them free use of it.

  96. Re:Very good by beme · · Score: 2

    It's only nothing to sneeze at if you're a W2 employee with bene's. This would be more like 1099 work, and 31.25/hour is pretty low, once you start to throw in things like social security taxes, insurance, PTO, etc.
    The big benefit would be to use it as a foot in the door for full-time employment with Google. Even if you don't win, it might be a good way to get an interview.

    --

    -beme
    1971
  97. Re:Dogs are better than cats by Alsee · · Score: 2

    "I love dogs" 15,300 hits
    "I love cats" 27,900 hits

    If I hadn't done the "I love" query the dogs would have won, but now I am just confused. Which is the most popular??


    And some more results, just to increase the confusion level:
    "dogs love me" 439 hits
    "cats love me" 126 hits

    -

    --
    - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  98. I wish I could code this by cascadefx · · Score: 4, Interesting

    I hope Google reads these pages and gets some free ideas from it. At least take mine! Please. God knows that I don't have the coding chops to do it myself. I sent this same idea to Allaire (remember them) a long time ago and I had a couple of software engineers write me back, but nothing ever came of it. My guess is that this is a hard problem.

    I want a browser control/plugin/whatever that harnesses a backend of web information to make my surfing more productive/predictive.

    The gist would be to have a hover option for links which would give you information about what is behind the link without having to actually follow it. While browsing, the user would just hover over an link in a page and information pertaining to the page beyond the link would show up in a hovering menu or a sidebar (this would be great with mozilla, but I could see an activex control as well).

    The types of information is where it gets useful. Using some of the more advanced summarization algorithms out there, it would pull up the summaries of those pages if they were in the offsite database (Allaire, Google, and the WayBack Machine being possible backends). Based on your preferences a short, medium or long summary would be displayed. If it wasn't in the cache, it could be summarized on the fly and then presented after some delay (the new summary now being cached).

    It would also list, in an orderly way and subject to preferences, links from the page on the other side. That way the user could follow one of those if it turns out that she only needed the summary and a link. It would also list the elements of the page, like graphics, and give their specs (i.e. dimensions and estimated download times and ALT tag entries if present) and give the option to display them on a page by page basis. All of this would be nested, of course, so that a user could hover over links in the summary pages and get the same information all over again for that link (which is why I see it more as a "sidebar" feature). Theoretically a user could just surf by these summaries if they wanted.

    Now, I realize that this would pose some problems like trusting the summaries and so forth. However, the nice thing about it would be features that could be built into the user's preferences. For instance, you could make it so that the user could have certain words or phrases set that would then be scanned for during the summarization process. You could then either relax the amount of summary for the entire page or, better yet, still pull the cached summary but also pull a user-definable number of lines before and after their keywords (best of both worlds).

    Each summary could also list a numeric rank of where that page fits in "status" (like google's ranking system) based on the summary (generically) or the keywords of the user (specifically). Finally, it could pay for itself with text advertising (small and innocuous like the ones seen on Google).

    If you start to think about it for a while, there are all sorts of things you could do with this and it would help cut through the "padding" that you usually go through while looking for informaition on a certain subject. I think it would be great! It is kind of based on the idea of the "magic spyglass" that was heralded almost a decade ago, but never implemented in any OS that I know of.

    Like I said, I can't code it, but I would love to see it done. So have at it if you think it is good. Google's cache of pages and images and its ranking technology make it perfectly suited for this type of problem and they have enough PHD's that the summarization issue should prove an "interesting" problem to solve.

    Then again, it might suck. If you do implement it, let me know. I would love to beta-test it. I called the whole thing the Clairvoyant Browser Plugin... but you could use what you want.

    1. Re:I wish I could code this by AntiFreeze · · Score: 2

      Take a look at www.alexa.com, that's pretty much what they're all about.

      --

      ---
      "Of course, that's just my opinion. I could be wrong." --Dennis Miller

    2. Re:I wish I could code this by cascadefx · · Score: 2

      I disagree. Thanks for the link, though.

      The Alexa toolbar doesn't do summaries and doesn't nest data. It also doesn't break apart the features of the website like links graphics and plugins and give you the option of viewing the page with/without them should you choose to click through. It also doesn't allow you to define session keywords and phrases and then modify its behavior for your personal browsing session.

      The things it does do are flawed. First, its rating system is based on site traffic generated by other Alexa users. While it could be argued that it is a random sample and therefore statistically accurate, I doubt it. Google's ranking system is more democratic... and either way it seems to work for the most part.

      I think this plugin would have great promise if it used google technology and resources on the backend. I would be excited if they could get it running.

      Thanks for your info though. I hadn't been there for a while and didn't know if they were still around. It is sad that they still haven't coded something more interesting than a user-profiler, though. The back-end archive is the only thing that is exciting in my book. Maybe google can by Alexa's archive in the future... they bought Deja after all.

  99. And how much does Google charge? by Hoi+Polloi · · Score: 2, Insightful

    I think it is funny that people are complaining that Google is getting something for nothing. I could say the same about everyone who uses it's FREE search engine.

    --
    It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
  100. Re:Odd attitude that I'm seeing here by glwtta · · Score: 2

    come up with an innovative idea worth patenting

    Am I the only one who remembers the good old days when "inventions" and "devices" were patentable, rather than ideas (at least ostensibly)?

    --
    sic transit gloria mundi
  101. Oompa Loompa Googledy Doo! by rbeattie · · Score: 2

    Oompa loompa googledy doo
    I've got a perfect puzzle for you
    Oompa loompa googledy dee
    If you are wise you'll listen to me

    What do you get when you use the web too much
    Browsing all day and getting a gut
    What are you at, getting terribly fat
    What do you think will come of that
    I don't like the look of it

    Oompa loompa googledy da
    If you're a good hacker, you will go far
    You will live in Menlo Park too
    Like the Oompa Loompa Googledy do
    Googledy do

    Oompa loompa googledy doo
    I've got another portal for you
    Oompa loompa doompeda dee
    If you're "Feeling Lucky" you'll listen to me

    Programming's fine when it's once in a while
    It earns you lots of money and keeps you in style
    But it's repulsive, revolting and wrong
    Programming and hacking all day long
    The way that a geek does

    Oompa loompa googledy da
    Given good bandwidth you will go far
    You will live in Menlo Park too
    Like the Oompa Loompa Googledy do

    Oompa loompa googledy doo
    I've got another feature for you
    Oompa loompa googledy dee
    If you are wise you'll program with me

    Who do you blame when your program is slow
    Unscalable and bloated like a hindue cow
    Blaming the admins is a lie and a shame
    You know exactly who's to blame
    Only the de-ve-lo-per

    Oompa loompa googledy da
    If you're not spoiled then you will go far
    You will live in Menlo Park too
    Like the Oompa Loompa Googledy do

    Oompa loompa googledy doo
    I've got another search for you
    Oompa loompa doompeda dee
    If you are wise you'll advertise with me

    What do you get from a glut of TV
    A pain in the neck and an IQ of three
    Why don't you try simply searching the web
    Or could you just not bear to look
    You'll get no
    You'll get no
    You'll get no
    You'll get no
    You'll get no commercials

    Oompa loompa googledy da
    If you like programming you will go far
    You will live in Menlo Park too
    Like the - Oompa -
    Oompa Loompa Googledy do

    (With all due respect to Leslie Bricusse and Anthony Newley http://gunther.simplenet.com/v/data/theoompa.htm )

    --
    Me
  102. I'd rather watch searches by Hoi+Polloi · · Score: 2, Insightful

    I'd be more interested in compiling search entry data and analyzing it for trends, etc. I'm sure Google does this already. Studying that would say more about what people are interested in on a day to day basis than webpages.

    --
    It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
    1. Re:I'd rather watch searches by Ziviyr · · Score: 2

      You mean like their zeit-thingy? :-)

      http://www.google.com/press/zeitgeist.html

      --

      Someone set us up the bomb, so shine we are!
  103. This is how Google started by John+Harrison · · Score: 2
    Feel free to correct me if I am wrong, but I remember that in 1996 or 1997 there was an interesting database class at Stanford. The premise of the class was, "We have a very large database of the the text of a bunch of web pages and of the links between those pages. This class will explore things that you could do with that database." Basically everyone that took the class came up with their own project to do some sort of interesting searches on this data. The group that put the class together had a demo webpage at http://google.stanford.edu.

    One of my friends tried to get me to take the class but I refused. I think my reason was that Jeffrey Ullman was associated with the course somehow and I couldn't stand him. His books were ok, but the few times that I went in to get help from him he was totally condescending. I decided never to take a class from him again. Interesting how some people who are so smart think that their smarts makes up for their complete lack of courtesy and/or patience. So that is how I missed out on having something to do with Google. Aren't I lame? Yes Andy, I know you told me to take it.

  104. Re:What a load of horseshit by WNight · · Score: 2

    There isn't international copyright law, but there're international treaties to ensure that countries have similar copyright laws.

    And material to be copyrighted doesn't have to be written down, it has to be "fixed in tangible media" or something similar. As in, you can't have just said it to a friend once.

    Here's a quote "Under the Copyright Act of 1976, the basis of U.S. copyright law, copyright is automatic when an original work is first "fixed" in a tangible medium of expression. That means material is protected by copyright at the point when it is first printed, captured on film, drawn, or saved to hard drive or disk."

    I'm merely counting on the wording being utilitarian and the quotation short enough that it's not a violation to quote it. :)

    But it's not a stretch for someone to believe that it had to be printed, until ten years ago I'm sure that's what most lawyers said, not knowing there was another way to make most things tangible...

  105. Re:What a load of horseshit by WNight · · Score: 2

    You can't copyright ideas, but you can sue people for many different things if they use yours in a situation where you could reasonably have expected to get paid for giving them the idea.

    In this case, you could expect them to use the winning idea (being the best and all) and not the rest. If they pay for one idea they use, you've got a case that they should pay for the rest.

    Similarly, the writers of shows are often forbidden by their lawyers from looking at ideas from fans, anything more complex than "Make the enterprise fight more klingons" is off limits. Now, as much as I dislike lawyers, they do have a good idea of the current legal climate and likely wouldn't tell their clients to do something like that unless it served a legal purpose which must mean a few companies have been sued over it (and lost).

  106. Re:RTFM by WNight · · Score: 2

    That should have been "RTFR" or "RTFL" and either way, I don't think it matters all that much.

    I've said more in replies to the other posts, but the summary is that you could probably sue them if they used this as a ruse to get free ideas and code. Likely though they'll hire anyone who does well in the contest, making it a moot point.

    Most ideas though aren't valuable because of the idea, but because of the development. Our funky ideas of what to do with a DB the size of theirs is easy, it's the merging of the idea and the reality where the work really comes in, so even if they did take the losing ideas it wouldn't help them tons, they'd just have some undeveloped ideas, much the same as what I'm sure they get emailed every day "Hey, have you guys thought of adding ... "

  107. It's a test by DahGhostfacedFiddlah · · Score: 2

    This is the *first* Google programming competition, so they've obviously never done this before. If I were a stockholder, I don't know if I'd be happy about my company offering $100k, $1M, whatever - in a plan that may not generate anything at all. I suspect that if this is as successful as I expect it to be, you may see that kind of money being thrown around by a lot of companies in the future. Imagine a world where you could make enough money to live on just by winning competitions companies put out...

  108. Perfect! by hobbit · · Score: 2, Funny

    Whilst you're at it, why not write a program which comes up with ideas for next year's annual google programming contest, using one part randomness, one part cleverness?!

    --
    "Wise men talk because they have something to say; fools, because they have to say something" - Plato