Slashdot Mirror


Google Launches Google Sitemaps

Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."

14 of 223 comments (clear)

  1. great interview by professorhojo · · Score: 5, Informative

    for more crunchy detail, here's a great Q&A interview i found with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps:

    http://blog.searchenginewatch.com/blog/050602-1952 24

  2. Creative Commons Meme by broward · · Score: 2, Informative

    It's not surprising that Google is using a Creative Commons license. The meme has been steadily gaining strength for over a year.

    http://www.realmeme.com/miner/preinflection.php?st artup=/miner/preinflection/creativecommonscontentD ejanews.png

  3. Re:Cool idea by Eric+Giguere · · Score: 3, Informative

    Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...

    Eric
    Currently at #1 for adsense tips
  4. Re:Sitemaps abuse? by Anonymous Coward · · Score: 1, Informative

    Look at the schema. None of the content of a sitemap file has anything to do with the content of your pages. It is all metadata -- url, last modified time, expected modification frequency, etc -- meant to help crawlers find your pages and be smarter about keeping their index/cache up to date with a minimum expenditure of bandwidth.

  5. Re:Cool idea by rehannan · · Score: 4, Informative

    I just put a new site online. About 4 or 5 days after submitting it to google, it was the number one hit when searching for the title of the site.

  6. Re:Sitemaps abuse? by ArbitraryConstant · · Score: 3, Informative

    Well, I noticed two things about it...

    First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.

    Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.

    These aren't commands, they're hints.

    --
    I rarely criticize things I don't care about.
  7. Re:Google is IT's Willy Wonka by Anonymous Coward · · Score: 1, Informative

    I work at Google.

    I envision the interior of Google as this huge warehouse full of oversized transistors, data streams with paddleboats, waterfalls of caffeinated beer, chairs contoured like a keyboard key, where diminutive men in green hair sing songs about electrons and logic gates and if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.

    I'm actually 6'2", and my hair is brown, and it's Duke Nukem Forever, but otherwise you're right on.

  8. Re:Next thing you know... by TuringTest · · Score: 3, Informative

    And the next thing you know will be Google launching specs on web design and then content.

    As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?

    --
    Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
  9. Re:Google is mightier than slashdot by Gnascher · · Score: 2, Informative

    Au contraire ... Google is returning a 502 on the provided link. Slashdot killed Google. Too bad too ... I wanted to read about this stuff...

    --
    It's not my fault! It was this way when I got here.
  10. Re:Cool idea by singleantler · · Score: 4, Informative

    It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox and seems to have been confirmed by the patent application recently made public.

    The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.

    --
    "What if they're using IE?" "I've dumbed Mozilla down to cope with it." - BOFH
  11. Re:Cool idea by mgbaron · · Score: 3, Informative

    I think I can shed a little light on this situation as I have had both of the above cases happen to me.

    This is how the system works. Google can index your site very quickly (within a couple of days), if you have an incoming link or submit to their crawler. If your site is well keyword optimized for a fairly rare keyword, it is entirely plausible that it would come up number one fairly quickly.

    What takes a long time is for google to update their pagerank index. This is where your site will sit in the Google Sandbox for a while Google updates your pagerank.

    In most cases, the sites initial pagerank of 0 will not be enough to take it to the top.

    For a site that we just released about 10 days ago, this was not the case (http://www.jimschlessinger.com/). Since the keywords we were optimizing were fairly rare, it climbed right to the top.

  12. Re:Google is IT's Willy Wonka by Wee · · Score: 2, Informative
    I never saw any paddleboats, but they did have a keg of beer outside the cafe yesterday. And there's no shortage of caffeinated drinks in the mini-kitchens.

    I can neither confirm nor deny the existence of any secret video game testing rooms.

    -B

    --

    Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

  13. SiteMaps Generator crashed our server! by Un-Thesis · · Score: 1, Informative

    I had a very bad experience with python sitemap generator from SourceForge using the 'accesslog' option. I plugged in a 10MB sitelog from our corporate site Great Seats to Sold-out Events, which has ~22,000 pages.

    Within five minutes it crashed my development server, a 3200 MHz Pentium 4 with 2GB of RAM running Debian Linux. Just imagine if this had been the production server...the costs for over-utilizng the webserver

    For the details, see http://www.incendiary.ws/node/94 Please syndicate my content if you want :-)

    --
    Promote freedom; fight fascism.
  14. insight into unlinked directories by e**(i+pi)-1 · · Score: 2, Informative

    I had been writing a primitive sitemap generator myself using shellscripts
    essentially using "find" and "grep" alone, but this tool is much better,
    faster and easy to configure. Cool.

    Note that this tool will allow google to reach files which never would be
    found by spidering a site, because the files are not linked. If you
    include something like

    <directory path="/var/www/html" url="http://www.example.com/" />

    in your config.xml and run "sitemap_gen.py" on it, you will give the world
    access to a large amount of material
    (like test versions of your website or source code you did not want to
    make accessible). We might see lot more material material which had been
    'hidden'.