Google Launches Google Sitemaps
Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."
for more crunchy detail, here's a great Q&A interview i found with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps:
2 24
http://blog.searchenginewatch.com/blog/050602-195
Everyone else defines a protocol. But apparently Google defines protocools.
I guess the rest of the world has a long way to go to catch up...
This is a cool idea, because I've often wondered about being able to "talk" to search engines at a slightly higher level than robots.txt allows.
For example, a website we launched a couple months ago is primarily images. We played nice - all of the images have legitimate alt tags, and we tried to let the site degrade properly in older browsers (although you really wouldn't get much, in those instances).
But the biggest problem we had was trying to get the site spidered by Google. It would be, and it would appear in the index, but it would be listed far below sites that linked to it. I don't believe Google likes sites that are primarily images. We populated meta tags with descriptions, but they weren't included; we even tried using hidden text - legitimate, hidden text that would serve as the sites description, but not break the design - but you know how Google feels about those sorts of things. We had to walk a fine line. This'll be nicer.
concrete5: a cms made for marketing, but strong enough for geeks.
Good Luck convincing Microsoft to adopt a Google Standard into their enterprise web server product.
Sure, if I don't want to read about Google, don't open the article, I know. But I can't even do a search on the site here without now being reminded it's a "Google Slashdot." (See new button on bottom of this page.)
The Slashdot promotion of Google is reaching Onion-Level parody status, 'cept it's not a parody, it's real.
Just... rest it... mebbe a coupla two-three days, but just... rest it.
Had to say it:
http://www.fuckedgoogle.com/
...you're supposed to be evil!
The site is still in Beta! Is it launched while still in beta?
It will take a company with enough influence like Google to really promote XML sitemaps, which could lead to a great thing... but what is to stop them from becoming like MetaTags where companies will just flood them with useless keywords and entries in an attempt to get better search rankings?
My first thought was this will really help bloggers. Not really because those blogs updated the most are generally the ones getting the most traffic already anyway?
with /. taking a picture? You'd think that google is Slashdot's first born. Better ease up a bit before little Mac gets jealous.
I would love to see a new meta tag for address to become common. Could make things like Google local even more useful.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
About Google Sitemaps
1. What is Google Sitemaps?
Google Sitemaps is an experiment in web crawling. Using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and improve the time to inclusion in our index. By placing a Sitemap-formatted file on your webserver, you enable our crawlers to find out what pages are present and which have recently changed, and to crawl your site accordingly.
Basically, the two steps to participating in Google Sitemaps are:
1. Generate a Sitemap in the correct format using Sitemap Generator.
2. Update your Sitemap when you make changes to your site.
2. Who can use Google Sitemaps?
Google Sitemaps is intended for all web site owners, from those with a single web page to companies with millions of ever-changing pages. If either of the following are true, then you may be especially interested in Google Sitemaps:
* You want Google to crawl more of your web pages.
* You want to be able to tell Google when content on your site changes.
3. How much does it cost?
Absolutely nothing. Google has never charged for placement in our search results, and we don't have any plans to do so.
4. Why is Google doing this?
In alignment with Google's mission to organize the world's information and make it universally accessible, this collaborative crawling system will allow our crawlers to optimize the usefulness of Google's index for users by improving its coverage and freshness.
5. How do I get started?
Read 'How do I create a Sitemap' below to learn about the format for Google Sitemaps. We also have detailed documentation on the Sitemap Protocol and the Sitemap Generator if you'd like to skip straight to the technical details.
6. Do I need to sign up for a Google Account?
You don't need an account to generate and submit a Sitemap. However, we encourage you to sign up for an account so that you can track the status of your Sitemaps and view diagnostic information for your submissions. Having an account will not affect your site's ranking within our results. If you already use Gmail, Groups, My Search History, Alerts, or Froogle Shopping List, you already have a Google Account and can sign in with your existing account to use Google Sitemaps.
7. Will participating in this program change my pages' ranking in Google search results?
No. Using Google Sitemaps will not influence your PageRank; there will be no change in how we calculate the ranking of your pages.
Sitemaps
1.What is the Sitemap Protocol?
The Sitemap Protocol is a dialect of XML for summarizing sitemap information that is relevant to web crawlers. For each URL, you can include crawl "hints" like the last modified date and approximate change frequency. You can read more about the Sitemap Protocol here.
2. How do I create a Sitemap?
There are a number of methods you can use to create a Sitemap. You can use Google's Sitemap Generator, downloadable from Google Code - it's a simple script that generates Sitemaps for basic use cases. You can read more about the Sitemap Generator below. If the Sitemap Generator will not work for your site structure, we encourage you to write your own script for generating Sitemaps and share it with others.
3. Will Google crawl and index all of the URLs in my Sitemap?
We don't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it.
4. How do I submit my Sitemap to Google?
There are a number of ways to submit your Sitemap for inclusion in Google Sitemaps. The Sitemap Generator script can build and submit your Sitemap automatically. If you don't use the Sitemap Generator, you may also submi
Ermm this is all well and good and such but isn't a large chunk of this information already made available via Cache-Control and Last-Modified HTTP headers?
Reminds me of blog pings - what's wrong with using the Referer header? Doing some checking and then fetching the referering page and checking for linkage?
Has the world gone XML crazy?
I envision the interior of Google as this huge warehouse full of oversized transistors, data streams with paddleboats, waterfalls of caffeinated beer, chairs contoured like a keyboard key, where diminutive men in green hair sing songs about electrons and logic gates and if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.
"I have never won a debate with an ignorant person." -Ali ibn Abi Talib
I would be interested to hear what the benefits are to me for doing this? From the FAQ it indicates that it will not change my Page Rank. Now, I know the page rank does not really mean much to my overall raking in Google results. However, if I go through the effort of creating and updating this whenever my site changes, how will it benefit me?
Adventure City Tours
It's not surprising that Google is using a Creative Commons license. The meme has been steadily gaining strength for over a year.
t artup=/miner/preinflection/creativecommonscontentD ejanews.png
http://www.realmeme.com/miner/preinflection.php?s
"According to the Google Blog, this is being done '...so that other search engines can do a better job as well."
I love the fact that they're saving us all a lot of time by giving Yahoo! access to this, so we don't have to wait for them to create their own version...
This sounds like a really cool idea.
Livejournal.com has had a number of problems with Google, and often just plain outright bans them from spidering the site. Part of the problem is that all the registered users have their journals at journalname.livejournal.com as well as livejournal.com\users\journalname. This means indexing the journals for resisted users doubles the load on their server farm!
With something like this, livejournal would be able to define exactly how often the indexing process occurs, and could control which version f the URL is indexes.
I assume issues like this are far from unique.
This is a win-win. Google doesn't have to have it;s spiders crawl sites as often, server load on the various sites is reduced, and indexing frequency is in line with how often the webmaster wants the site to be indexed.
And licensing means that hopefully, the same XML file will be end up being good for multiple search engines!
Very cool technology. Hopefully it's also highly abuse proof. I'd hate to see the results of something like this being used by the "Search engine optimization" firms.
"Live Free or Die." Don't like it? Then keep out of the USA
Do you n3ed V1agra or Sialis? We have the best and af728 most potent types fo...
It's too bad they couldn't use figure out a way to add addtional keywords to robots.txt. (w/o breaking it) Now one needs to create both files for a site to index properly.
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
In other news, the Google Evil Index went down 3.2 points today, and is currently at 13.8, the lowest it's been since right before the beta rollout of Google Web Accelerator.
Pulp Audio Weekly - Geek News and Reviews
Somebody's using Lotus Notes as a webserver? May God have mercy on their souls.
(The submitter probably meant Lotus Domino, which is still a bad webserver, but not nearly as bad as Notes would be.)
Comment of the year
Okay, I've read the article but I guess I don't really get it. Why do we need an XML sitemap to give Google this information? Does this provide enough of an advantage over the unsupported and obsolete revisit-after meta? As for when things were last changed, wget seems to be able to figure that out just fine already. I'm guessing that it can be used to quickly inform a search engine that new pages exist on the site and I can imagine some nice things being possible to end users with the appropriate browser patches, but strictly from a search perspective, why is this needed? (Honest questions, I want to know if this is something I should have on my sites.)
Remember RFC 873!
Every single time Google is mentioned someone posts how tired they are of Google stories. We know you're tired of Google stories, slashdot editors know you're tired of Google stories, hell's bells, my fucking cat knows you're tired of Google stories.
Why don't you do everyone a favor and shut the fuck up about it already?
If people use this, it will likely remove much redundancy from google's indexing processes, possibly freeing up bandwidth and processing power in their datacenters for other projects like more web-based applications...
Navigation is sometimes the hardest part on the internet. A tree structure is sometimes the second easiest way of searching/browsing for information (1st being keyword searching). So maybe if more web designers set up server side solutions, it will lower the burden on web designers. More importantly, move navigation away from web designers to users just as Google displaced content from web designers unto Searchers. So instead of overburdening web servers like this Firefox extension Firefox extension with screenshot which automatically generates a sitemap br crawling a site. Sites can access a sitemap using a favicon.ico like or link rel="sitemap.rdf or sitemap.xml" protocol. Just as netscape NAVIGATOR originally proposed a while back. I think web designers should pay attention - at least those that don't use flash for their whole site. The web is slowly become a database of content rather than style. See the webmonkey wired article on netscape sitemap feature Sitemap rdf or the sitemap slide here Slide from seminar
I agree. I used to host (publically accessible) Mailman archives, and once a month, Google would come through and scan every message. My bandwidth usage that day was at least 10 times what it was on other days, but I wanted the messages to be searchable. Using this to set them to "archive" so they'd only be scanned a couple times a year would've been great.
"Google is licensing the idea under the Attribution/Share Alike Creative Commons license. "
And I'm willing to license my idea, "better search engines with better user interfaces", to Google, for a modest sum.
--
make install -not war
And the next thing you know will be Google launching specs on web design and then content. Who will comply? well.. anybody who wishes to be indexed by Google. That is 100% of the website owners. And thus Google will control the design, content and other things... HELP... they are taking over the internet
This might be marked as troll... but think about it. Isnt it possible?
fuvoo: watch something
Python Rocks!!!!!!!
Seems to me that a better solution would be EITHER disallowing indexing of the registered users ljname.livejournal.com pages OR disallowing everything BUT ljname.livejournal.com, granting more benefit for registration.
Not gonna /. google, dude.
I'm wondering: why do you need a license to implement this? Did Google patent this?
In any case, patented or not, the CC license that this falls under seems acceptable for an open standard, even if it is patented, because it is transferable and because its requirements are minimal. Contrast this with the Microsoft Office XML license, which is royalty-free (for now...), but non-transferable.
It needs Python 2.2, and I only have 1.5 running. Unfortuately, so many things depend on it (*cough* Ensim *cough*) that attempting to upgrade is a death wish.
:)
Will wait until I get my new server.
PocketGamer.org - For the gamer on the go!
An idea cannot be copyrighted, and thus cannot be licensed under a copyright license like Creative Commons. File formats, being facts, shouldn't be copyrightable either. If the text of the spec is licensed as Attribution-ShareAlike, then all this allows is people to fork the spec, causing confusion.
Why should Google be different than any other topic?
Before any liberals are tempted to mod up one of my comments, a word of warning: I'm actually making fun of you.
Check it out. It can even tell you how many restaurants exist between Preferences and YRO.
And your contributions to the discussion are so helpful too.
Me too. I host a couple of sites on my home ADSL line, and my usage is about 6GB/month, mostly MSN, Google and Yahoo's crawlers indexing and reindexing the same pages over and over. MSN especially I would like to slow down.
The thing that seems so cool about this sort of thing is that it opens up the search service to the rest of us to help us make our content easier to find when it is updated. One thing that I have come to really respect about Google is that they don't rely on the government to beat Microsoft back down the way Netscape did. Google has managed to make a product that 47% of the US Internet users want to use, even though MSN is the default in IE. Remember Netscape 4? There's a reason that bloated POS failed, anyone who remembers the releases of it for the first six months that it went public knows EXACTLY why that was.
The only thing that Google can do at this point is continue to let some of their more biased employees run wild. They've been causing Google's Adsense and Adwords to take extremely partisan stances between the Dems and Reps, and that's gotten the ire of many on the right. My concern is primarily that Google will end up pissing off so many of these users that they will end up switching to MSN and helping Microsoft take Google down. Google is certainly not perfect, and I'm still wondering why Google News had the National Vanguard, a neo-nazi publication in their news feed list, but says that some of the bigger blogs like Michelle Malkin are not up to editorial snuff. Go figure, like the neo-nazis aren't biased or anything. Then there's their tendency to run ads for Hamas on their arabic pages.
Oh well, in many respects they still have a lot farther to go before they have tried as much evil as Microsoft and they are still more innovative, so time will tell.
Click here or a puppy gets stomped!
Just think of this sort of thing as inter-linking web services sitting on top of the http protocol.
Justin.
You're only jealous cos the little penguins are talking to me.
"Like it or not Google is an inovative company."
You are right.
No other company has ever launched an Internet Search function.
No other company has ever launched web-based email.
No other company has ever provided online maps.
No other company has ever offerd the contents of usenet via the web.
No other company has ever offered navigable satalite photos of the planet.
No other company has ever offerd realtime webcaching and compression to "speed up" one's access.
No company has ever cached websites for access when they are down or no longer available.
No company has ever offered a price-checking website.
Oh, hang on, wait a minute...
When you look at it, I mean actually take a step back and LOOK, Google is a highly derivative company, with not much in the way of true innovation.
They take existing ideas and functions, and tweak them. Coupled with their "geek coolness" and hero-worship, they are simply riding the hype wave.
People should not be afraid of their governments - Governments should be afraid of their people.
If you have a bunch of data in a MySQL database, ordinarily Google can't find it. You have to create a static link somewhere with a URL for the search you want to make googlable. Those take maintenance.
There may be some sites that want certain areas crawled, but not others, and those areas aren't maintained by the webmaster or only the top-level part should be hidden from search (which is awkward or impossible to handle with robots.txt). There are always user pages, maverick corporate departments, or whatever.
This offers a way to do all of that in a systematic way. Very nice way to solve several seemingly unrelated problems at once.
sigs, as if you care.
Now it's just a matter of time until some enterprising developer creates a browser extension that allows this data to be used by the end user during a surfing session. A consistent, complete, trustworthy, and easily-parsed site map definition could allow for some really interesting new paradigms in navigation around a site. Just off the top of my head I imagine a simple tree view of where you are in relation to the rest of the site could be very handy when navigating some of the gigantic maze-like corporate-sites.
My journal
Every single product they put out is slashdot worthy.
This is such bullshit. Some of the stuff they put out is very cool and newsworthy like Google Maps, Gmail, etc... But so much crap that is either not ready yet, not unique, or just plain boring gets posted here. Its literally a direct feed of the Google blog half of the time.
It also annoys people like me how people on slashdot treat Google like its the second coming of Christ. If anyone says anything negative they get bombarded with posts saying they suck or their ideas are crazy, and any critism of google is given either a "ITS IN BETA!!!" or "YOUR A MICROSOFT ASTROTURFER!!!" and people also waste space giving google a handjob for an idea that has already existed for years. The personalized google portal and the google satellite pictures are two examples that come to the top of my head. Both of these things had been done by other sites for years and then google comes out with them and in the case of satellite pics did not improve upon the existing sites out there and in the case of the portal made an inferior product. Yet when Yahoo or MSN come out with a product that is an attempt at improving something existing, those same people say "COPY CATS!". It also doesn't help that the customized Google Portal only allowed you to add two news sites and one of them was slashdot. Furthermore many Google people post here, and they mention slashdot in the Google blog, so I think another thing that annoys me is there is a large amount of suspected Google astroturfing here.
Last I heard they had blocked ALL Google indexing. robots.txt is somewhat restrictive.
"Live Free or Die." Don't like it? Then keep out of the USA
[To ELP's "Lucky Man"]
They had white pages
And hits by the score
All the people's queries
Waiting by the door
Ooooh, what a search engine it was
Ooooh, what a search engine it was
Many geeks and hackers
They made up its core
Everybody's dearest
A daily stop for more
Ooooh, what a search engine it was
Ooooh, what a search engine it was
It went to the market
Of the engines it was king
Of his honor and his glory
Slashdot would sing
Ooooh, what a search engine it was
Ooooh, what a search engine it was
A burst had found it
Its money dried as it sank
No praise could save it
So it vanished and it died
Ooooh, what a search engine it was
Ooooh, what a search engine it was
Overall this is offloading Google's workload onto webmasters.
You do have to wonder how much of the 'do no evil' philosophy is cover for the "let us store and index all information about everything, including you" philosophy. Not that I'm going to stop using Google until their results become less usable than Yahoo's results...
I think we slashdotted Google! I get 502 Server Error all the time and can't connect with any of the Google pages in the article.
It's also possible that Google's CEO could go on a murderous rampage tomorrow at Microsoft's Redmond campus. 0.000000000001% is still a possibility you know. Do you realize what would happen to Google if they did that? They'd be dumped by most website owners faster than they could count the drop in their search and ad hits.
Then again, Google coming up with detailed design guidelines for their pages for public consumption would be incredibly useful for designers. They use a lot of cutting edge JS tricks like AJAX and their layout is great. It's very clean and the kind of thing I wish I had the skills right now to emulate, but I have too much to learn about web design right now to do that.
Click here or a puppy gets stomped!
You can google slashdot
But you cant slashdot google!!!
On february 16th i sent google the following email to suggestions@google.com: Hi,
This is a suggestion for the people who take care of indexing web sites.
Because Google is the first search engine of choice it has enough of influence to point noses into the same direction.
So, i propose a new element to be added to websites: a sitemap file. Similar to the favicon file, every site could have an (xml?) file containing information about the info and the info-topography on the site.
Google has already a 'similar pages' link added to search result. What about adding a link 'show context'. If clicked upon a page is shown that provides info on where the search result is located on the site: the context of the information.
The sitemap file could also be used by in Googles core indexing-process: providing extra context to evaluate the validity of the indexed page.
Some other related advantages: google could release a sitemap/browser plugin for users. For example: open a site and if the website contains the special sitemap file, a browserplugin is activated allowing the user to browse the website using there prefered navigational tool. (instead of, or together with, any normal website menu's).
I hope to here from you
Kind regards,
mynamehere
The Netherlands
They even used the term 'sitemaps'.
OMG!!! We finally /.'d Google!
Um. It's the most obvious name, isn't it?
And I somehow suspect this has been in the works since before Feb of this year.
According to this Yahoo's bot is the most aggressive on my site. GoogleBot is really quite tame.
isomerica.net | Foonetic IRC
Instead of having to notify search engines (blech)
What about a robots.txt extension to define the
location of the sitemap index?
Were that I say, pancakes?
Now even google can't withstand the power of slashdotting...
Google: Error
Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
Reaching out to help the less fortunate search engines, how philanthropic.
My rss feeds already publishes my newest/freshest pages. Why did they didn't just extended it with some aditional attributes/tags instead of forcing me to implement another xml format?
Google
Error
Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
I'm not sure how alive and well Paid Inclusion (or whatever it is called nowadays... is it "Search Submit" now?) is at the moment, but they have had solutions for ensuring timely updates and ensured inclusion the their index for some time now. Commercial solutions, but still.
So I have a hard time imagining Yahoo touching this particular piece of Google technology any time soon... Unless they can assimilate it into the commercial offering.
I see an execution gap here, though. My blog is, what, 2600 pages? I'm obviously not going to build that XML file manually (with one node for each page). Google does provide a Sitemap Generator, but it's Python code meant to be run on my web server. My Python skills are nil, so that route isn't viable for me either. I expect that there's a good many 'webmasters' (as in, people who design and run websites) who don't know Python from perl. Given the CC license, though, maybe somebody will grab the code and build an idiot-proof solution for the Sitemap Generator.
Bingo! You hit the nail on the head. This is just like the big-box stores and grocers making you scan and box your own purchases, and not evem giving you a discount for it.
You are exactly right. The HTTP head can give info about whether the page has been changed or not and thus decisions can be made about whether to respider or not.
While there may be epi-phenomenal benefits to this scheme at some later point, I agree with you that this is simply google offloading what should be their own workload onto the web-sites.
That's a thing of beauty. Well, not really, it's a damn shame to waste a domain name on a nearly plain-text page, but it's still pretty funny. Does anyone really love google enough to host a page like that on their own? Wow, if so. I mean, I've always liked google, but would I rent out a domain to host a anti-anti-google website? I doubt it. Thanks for that, though. Definitely a +1 interesting from an AC.
I had a very bad experience with python sitemap generator from SourceForge using the 'accesslog' option. I plugged in a 10MB sitelog from our corporate site Great Seats to Sold-out Events, which has ~22,000 pages.
Within five minutes it crashed my development server, a 3200 MHz Pentium 4 with 2GB of RAM running Debian Linux. Just imagine if this had been the production server...the costs for over-utilizng the webserver
For the details, see http://www.incendiary.ws/node/94 Please syndicate my content if you want :-)
Promote freedom; fight fascism.
Google creates innovative technologies and technological implementations of other people's models.
Frequently the google implementation is vastly superior in some purely technological manner. In this way they are innovative.
Frequently, google is able to turn the superior technology into a superior user experience as well. In this way, too, they are innovative.
Frequently, google's cool creations are hyped beyond all belief. In this manner, they are... erm... the recipients of a geek love-affair, and not at all innovative.
nuff said
Got Code?
Well now. Google is the Messiah, and MS is the anti-christ. Don't you just love the fact that we're 80,000 completely random people. Were's that silent majority that every "Slashdot has no Hypocrisy" defender mentions?
A great idea - much better than waiting for the deep crawl. So far I've only seen a 502 error, but they are no doubt experiencing a deluge.
http://www.myrealtalk.com/
i see the problem here. noone is reading livejournal postings except google's spider.
if they blocked google, they could probably reduce their bandwidth enough to run all of those sites on a cable modem!
indierock / punkrock band photos and more... http://www.digitaldefection.net
I had been writing a primitive sitemap generator myself using shellscripts
/>
essentially using "find" and "grep" alone, but this tool is much better,
faster and easy to configure. Cool.
Note that this tool will allow google to reach files which never would be
found by spidering a site, because the files are not linked. If you
include something like
<directory path="/var/www/html" url="http://www.example.com/"
in your config.xml and run "sitemap_gen.py" on it, you will give the world
access to a large amount of material
(like test versions of your website or source code you did not want to
make accessible). We might see lot more material material which had been
'hidden'.
These guys are turning the world around every day.
Nope. LiveJournal users have an option to allow indexing or not. It's off by default but can be enabled by simple checking a box. My LJ is spidered by Google no problem.
Good thing, too, because Livejournal doesn't provide any way to search journals, even your own. If Google wasn't indexing it I'd never be able to find any old posts of mine. I wish other LJ users would enable this, as there's nothing more frustrating than being unable to find something you know someone posted six months or a year ago.
I like my women like my coffee... pale and bitter.
You shut up. The GP actually has a valid point. It almost seems like the only remaining difference between Google and Microsoft is that slashdot loves Google and hates Microsoft. I'm also surprised there isn't a google.slashdot.org subdomain yet...
I just ran the tool as a test on my pathetic little WordPress blog (it has a total of 3 pages). It happily chucked away and reported 842 files by combining the complete blog directory, awstats sub directory and the urls it found in the Apache access_log.
That would be a lot of non-essential crud for Google to spider.
Using filters is very simple, so the following filters removed most rubbish:
<filter action="drop" type="wildcard" pattern="*index.htm*" >
<filter action="drop" type="wildcard" pattern="*awstats*" >
<filter action="drop" type="wildcard" pattern="*wp-admin*" >
<filter action="drop" type="wildcard" pattern="*wp-includes*" >
<filter action="drop" type="wildcard" pattern="*wp-content*" >
<filter action="drop" type="wildcard" pattern="*wp-images*" >
I've entered a few sites I run and 2 of the 4 gave a "not found" error and then when retried, it worked. Perhaps a local DNS problem but probably something not working well enough on the google side...
http://www.hawknest.com/
Just demonstrates the editorial discretion of Slashdot. My news -- the first negative report of sitemaps -- was published by all the major search engine-related websites (over 20) yet here remains only a mod +1.
Promote freedom; fight fascism.