Google Launches Google Sitemaps
Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."
for more crunchy detail, here's a great Q&A interview i found with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps:
2 24
http://blog.searchenginewatch.com/blog/050602-195
It's not surprising that Google is using a Creative Commons license. The meme has been steadily gaining strength for over a year.
t artup=/miner/preinflection/creativecommonscontentD ejanews.png
http://www.realmeme.com/miner/preinflection.php?s
Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...
EricCurrently at #1 for adsense tips
Look at the schema. None of the content of a sitemap file has anything to do with the content of your pages. It is all metadata -- url, last modified time, expected modification frequency, etc -- meant to help crawlers find your pages and be smarter about keeping their index/cache up to date with a minimum expenditure of bandwidth.
I just put a new site online. About 4 or 5 days after submitting it to google, it was the number one hit when searching for the title of the site.
Well, I noticed two things about it...
First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.
Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.
These aren't commands, they're hints.
I rarely criticize things I don't care about.
I work at Google.
I envision the interior of Google as this huge warehouse full of oversized transistors, data streams with paddleboats, waterfalls of caffeinated beer, chairs contoured like a keyboard key, where diminutive men in green hair sing songs about electrons and logic gates and if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.
I'm actually 6'2", and my hair is brown, and it's Duke Nukem Forever, but otherwise you're right on.
And the next thing you know will be Google launching specs on web design and then content.
As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Au contraire ... Google is returning a 502 on the provided link.
Slashdot killed Google.
Too bad too ... I wanted to read about this stuff...
It's not my fault! It was this way when I got here.
It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox and seems to have been confirmed by the patent application recently made public.
The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.
"What if they're using IE?" "I've dumbed Mozilla down to cope with it." - BOFH
I think I can shed a little light on this situation as I have had both of the above cases happen to me.
This is how the system works. Google can index your site very quickly (within a couple of days), if you have an incoming link or submit to their crawler. If your site is well keyword optimized for a fairly rare keyword, it is entirely plausible that it would come up number one fairly quickly.
What takes a long time is for google to update their pagerank index. This is where your site will sit in the Google Sandbox for a while Google updates your pagerank.
In most cases, the sites initial pagerank of 0 will not be enough to take it to the top.
For a site that we just released about 10 days ago, this was not the case (http://www.jimschlessinger.com/). Since the keywords we were optimizing were fairly rare, it climbed right to the top.
Could Jesus microwave a burrito so hot that he himself cou
I can neither confirm nor deny the existence of any secret video game testing rooms.
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
I had a very bad experience with python sitemap generator from SourceForge using the 'accesslog' option. I plugged in a 10MB sitelog from our corporate site Great Seats to Sold-out Events, which has ~22,000 pages.
Within five minutes it crashed my development server, a 3200 MHz Pentium 4 with 2GB of RAM running Debian Linux. Just imagine if this had been the production server...the costs for over-utilizng the webserver
For the details, see http://www.incendiary.ws/node/94 Please syndicate my content if you want :-)
Promote freedom; fight fascism.
I had been writing a primitive sitemap generator myself using shellscripts
/>
essentially using "find" and "grep" alone, but this tool is much better,
faster and easy to configure. Cool.
Note that this tool will allow google to reach files which never would be
found by spidering a site, because the files are not linked. If you
include something like
<directory path="/var/www/html" url="http://www.example.com/"
in your config.xml and run "sitemap_gen.py" on it, you will give the world
access to a large amount of material
(like test versions of your website or source code you did not want to
make accessible). We might see lot more material material which had been
'hidden'.