Google Launches Google Sitemaps

← Back to Stories (view on slashdot.org)

Google Launches Google Sitemaps

Posted by Zonk on Friday June 3, 2005 @02:55AM from the please-stop-innovating dept.

Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."

29 of 223 comments (clear)

Min score:

Reason:

Sort:

great interview by professorhojo · 2005-06-03 02:56 · Score: 5, Informative

for more crunchy detail, here's a great Q&A interview i found with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps:

http://blog.searchenginewatch.com/blog/050602-1952 24
More unabashed Google loving... by sachmet · 2005-06-03 02:56 · Score: 5, Funny

Everyone else defines a protocol. But apparently Google defines protocools.

I guess the rest of the world has a long way to go to catch up...
Cool idea by aftk2 · 2005-06-03 03:00 · Score: 4, Interesting

This is a cool idea, because I've often wondered about being able to "talk" to search engines at a slightly higher level than robots.txt allows.

For example, a website we launched a couple months ago is primarily images. We played nice - all of the images have legitimate alt tags, and we tried to let the site degrade properly in older browsers (although you really wouldn't get much, in those instances).

But the biggest problem we had was trying to get the site spidered by Google. It would be, and it would appear in the index, but it would be listed far below sites that linked to it. I don't believe Google likes sites that are primarily images. We populated meta tags with descriptions, but they weren't included; we even tried using hidden text - legitimate, hidden text that would serve as the sites description, but not break the design - but you know how Google feels about those sorts of things. We had to walk a fine line. This'll be nicer.

--
concrete5: a cms made for marketing, but strong enough for geeks.
1. Re:Cool idea by Eric+Giguere · 2005-06-03 03:12 · Score: 3, Informative
  
  Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...
  Eric
  Currently at #1 for adsense tips
2. Re:Cool idea by rehannan · 2005-06-03 03:25 · Score: 4, Informative
  
  I just put a new site online. About 4 or 5 days after submitting it to google, it was the number one hit when searching for the title of the site.
3. Re:Cool idea by singleantler · 2005-06-03 04:16 · Score: 4, Informative
  
  It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox and seems to have been confirmed by the patent application recently made public.
  
  The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.
  
  --
  "What if they're using IE?" "I've dumbed Mozilla down to cope with it." - BOFH
4. Re:Cool idea by mgbaron · 2005-06-03 04:54 · Score: 3, Informative
  
  I think I can shed a little light on this situation as I have had both of the above cases happen to me.
  
  This is how the system works. Google can index your site very quickly (within a couple of days), if you have an incoming link or submit to their crawler. If your site is well keyword optimized for a fairly rare keyword, it is entirely plausible that it would come up number one fairly quickly.
  
  What takes a long time is for google to update their pagerank index. This is where your site will sit in the Google Sandbox for a while Google updates your pagerank.
  
  In most cases, the sites initial pagerank of 0 will not be enough to take it to the top.
  
  For a site that we just released about 10 days ago, this was not the case (http://www.jimschlessinger.com/). Since the keywords we were optimizing were fairly rare, it climbed right to the top.
  
  --
  Could Jesus microwave a burrito so hot that he himself cou
fuckedgoogle.com anyone? by Anonymous Coward · 2005-06-03 03:02 · Score: 2, Interesting

Had to say it:

http://www.fuckedgoogle.com/
Sitemaps abuse? by iolagnm · 2005-06-03 03:03 · Score: 3, Insightful

It will take a company with enough influence like Google to really promote XML sitemaps, which could lead to a great thing... but what is to stop them from becoming like MetaTags where companies will just flood them with useless keywords and entries in an attempt to get better search rankings?
1. Re:Sitemaps abuse? by drnlm · 2005-06-03 03:23 · Score: 2, Interesting
  
  That's really up to the search engine implementation, isn't it.
  Anyway, a brief look at the proposed format gives very little scope for abuse - you can specify location, change frequency, last modified and a priority, and that's it. The priority is specified as only applying to urls from the same site, so what you can do with it is fairly limited. Overall, it looks written as a set of additional hints to spiders crawling the site.
2. Re:Sitemaps abuse? by ArbitraryConstant · 2005-06-03 03:54 · Score: 3, Informative
  
  Well, I noticed two things about it...
  
  First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.
  
  Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.
  
  These aren't commands, they're hints.
  
  --
  I rarely criticize things I don't care about.
3. Re:Sitemaps abuse? by Jellybob · 2005-06-03 03:55 · Score: 2, Interesting
  
  Using XHTML this shouldn't be too hard - something along the lines of:
  <goog:index> Stuff that actually matters </goog:index> Advertising crap which people don't care about.
  It's not going to fix the problem on sites which are doing this delibrately, but for those of us who actually care about getting indexed relevantly it would be great.
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out by Mant · 2005-06-03 03:06 · Score: 2, Insightful

Well, maybe if Google stop doing stuff for a while?

Lots of slashdotters seem interested in what Google does, either becuase it tends to be neat, or so they can worry about privacy and the info Google potentially has access to.
Google is IT's Willy Wonka by stlhawkeye · 2005-06-03 03:07 · Score: 5, Funny

I envision the interior of Google as this huge warehouse full of oversized transistors, data streams with paddleboats, waterfalls of caffeinated beer, chairs contoured like a keyboard key, where diminutive men in green hair sing songs about electrons and logic gates and if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.

--
"I have never won a debate with an ignorant person." -Ali ibn Abi Talib
1. Re:Google is IT's Willy Wonka by Wee · 2005-06-03 05:17 · Score: 2, Informative
  
  I never saw any paddleboats, but they did have a keg of beer outside the cafe yesterday. And there's no shortage of caffeinated drinks in the mini-kitchens.
  I can neither confirm nor deny the existence of any secret video game testing rooms.
  
  -B
  
  --
  Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Creative Commons Meme by broward · 2005-06-03 03:08 · Score: 2, Informative

It's not surprising that Google is using a Creative Commons license. The meme has been steadily gaining strength for over a year.

http://www.realmeme.com/miner/preinflection.php?st artup=/miner/preinflection/creativecommonscontentD ejanews.png
Google Evil Index by yotto · 2005-06-03 03:11 · Score: 5, Funny

In other news, the Google Evil Index went down 3.2 points today, and is currently at 13.8, the lowest it's been since right before the beta rollout of Google Web Accelerator.

--
Pulp Audio Weekly - Geek News and Reviews
Re:How does this benefit me? by Eric+Giguere · 2005-06-03 03:20 · Score: 4, Insightful
It benefits you because:
- Google will hopefully crawl your frequently-changing pages more often
- Conversly, Google won't crawl other pages as often, saving your bandwith
- Google will find pages that it wouldn't normally find just by following links
Also, you wouldn't necessarily have to maintain more than one sitemap. You could use XSLT to create the sitemap.html file for your site from the XML file you create for Google. In fact, wouldn't it be nice for Web authoring tools to do this automatically for you?
Eric
Make Easy Money with Google: The Blog (powered by blojsom)
Or maybe another hidden use... by 823723423 · 2005-06-03 03:22 · Score: 4, Insightful

Navigation is sometimes the hardest part on the internet. A tree structure is sometimes the second easiest way of searching/browsing for information (1st being keyword searching). So maybe if more web designers set up server side solutions, it will lower the burden on web designers. More importantly, move navigation away from web designers to users just as Google displaced content from web designers unto Searchers. So instead of overburdening web servers like this Firefox extension Firefox extension with screenshot which automatically generates a sitemap br crawling a site. Sites can access a sitemap using a favicon.ico like or link rel="sitemap.rdf or sitemap.xml" protocol. Just as netscape NAVIGATOR originally proposed a while back. I think web designers should pay attention - at least those that don't use flash for their whole site. The web is slowly become a database of content rather than style. See the webmonkey wired article on netscape sitemap feature Sitemap rdf or the sitemap slide here Slide from seminar
Search Engine by Pac · 2005-06-03 03:53 · Score: 2, Funny

[To ELP's "Lucky Man"]

They had white pages
And hits by the score
All the people's queries
Waiting by the door

Ooooh, what a search engine it was
Ooooh, what a search engine it was

Many geeks and hackers
They made up its core
Everybody's dearest
A daily stop for more

Ooooh, what a search engine it was
Ooooh, what a search engine it was

It went to the market
Of the engines it was king
Of his honor and his glory
Slashdot would sing

Ooooh, what a search engine it was
Ooooh, what a search engine it was

A burst had found it
Its money dried as it sank
No praise could save it
So it vanished and it died

Ooooh, what a search engine it was
Ooooh, what a search engine it was
Re:How does this benefit me? by DigitalRaptor · 2005-06-03 04:01 · Score: 2, Interesting

Because when you launch a new site, or new section of your site, you create the site map and notify Google, rather than hoping some day they'll follow a link somewhere and come spider your site.

Google immediately knows that the site exists, immediately knows how many pages there are, how often they are supposed to change, AND what priority I place on them, so out of my 150 pages, the 10 I want spidered first are labeled as higher priority.

This makes total sense to me.

--
Lose Weight and Feel Great with Isagenix
Re:Next thing you know... by TuringTest · 2005-06-03 04:14 · Score: 3, Informative

And the next thing you know will be Google launching specs on web design and then content.

As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Re:Google is mightier than slashdot by Gnascher · 2005-06-03 04:16 · Score: 2, Informative

Au contraire ... Google is returning a 502 on the provided link. Slashdot killed Google. Too bad too ... I wanted to read about this stuff...

--
It's not my fault! It was this way when I got here.
Re:Cool idea (they stole my idea?) by neanderlander · 2005-06-03 04:17 · Score: 2, Interesting

On february 16th i sent google the following email to suggestions@google.com: Hi,
This is a suggestion for the people who take care of indexing web sites.
Because Google is the first search engine of choice it has enough of influence to point noses into the same direction.
So, i propose a new element to be added to websites: a sitemap file. Similar to the favicon file, every site could have an (xml?) file containing information about the info and the info-topography on the site.
Google has already a 'similar pages' link added to search result. What about adding a link 'show context'. If clicked upon a page is shown that provides info on where the search result is located on the site: the context of the information.
The sitemap file could also be used by in Googles core indexing-process: providing extra context to evaluate the validity of the indexed page.
Some other related advantages: google could release a sitemap/browser plugin for users. For example: open a site and if the website contains the special sitemap file, a browserplugin is activated allowing the user to browse the website using there prefered navigational tool. (instead of, or together with, any normal website menu's).
I hope to here from you
Kind regards,
mynamehere
The Netherlands
They even used the term 'sitemaps'.
502 Server Error! by md17 · 2005-06-03 04:21 · Score: 3, Funny

OMG!!! We finally /.'d Google!
Re:Next thing you know... by phidipides · 2005-06-03 04:34 · Score: 4, Insightful

And thus Google will control the design, content and other things... HELP... they are taking over the internet

Nice. Google proposes a way to help web site administrators have a bit more control over how their site is perceived by a search engine, releases this proposal under an open source license, and at least a few people on slashdot accuse them of (*pinky to corner of mouth*) taking over the internet.

Most of Google's recent actions have been good things -- sponsoring open source developers for the summer, proposing ways for site administrators to provide additional info about their site, and implementing a "nofollow" option to prevent spammers trying to increase their page ranking. However, if they constantly get criticized and second-guessed for doing good things, what incentive do they have to continue? If you give a charity $20 and they criticize you for not giving them $30, are ever going to give anything to that charity again?

Let's give Google the benefit of the doubt. Just like a person, they'll probably make some mistakes, but like a person I'll give them the benefit of the doubt until they prove me wrong. Some corporations do actually do good things and still manage to be successful, and in those cases they should be supported, not attacked.

--
JAMWiki Java-based Wiki engine
Why not just use rss/atom? by neves · 2005-06-03 05:00 · Score: 2, Insightful

My rss feeds already publishes my newest/freshest pages. Why did they didn't just extended it with some aditional attributes/tags instead of forcing me to implement another xml format?
1. Re:Why not just use rss/atom? by neves · 2005-06-03 06:48 · Score: 2, Interesting
  
  Silly me! Just found in their FAQ: you can use RSS/atom as your sitemap format!
insight into unlinked directories by e**(i+pi)-1 · 2005-06-03 06:47 · Score: 2, Informative

I had been writing a primitive sitemap generator myself using shellscripts essentially using "find" and "grep" alone, but this tool is much better, faster and easy to configure. Cool. Note that this tool will allow google to reach files which never would be found by spidering a site, because the files are not linked. If you include something like <directory path="/var/www/html" url="http://www.example.com/" /> in your config.xml and run "sitemap_gen.py" on it, you will give the world access to a large amount of material (like test versions of your website or source code you did not want to make accessible). We might see lot more material material which had been 'hidden'.