Google and Yahoo! Working Together On Better Web Indexing
Karzz1 writes "In an exclusive video interview with WebProNews, Yahoo and Google announced a collaborative site called sitemaps.org. Yahoo!'s Tim Mayer states in the video, 'This is something we are announcing tonight at around 9 PM tonight (Las Vegas) Google and Yahoo have gotten together to provide webmasters and publishers a unified way to send their content... let our search engines know about new and existing content.'"
I'm confused--when Microsoft does something good, do we just ignore it? You know, I'm all for criticizing their evil plans for world domination in the software market but shouldn't news be subjective not objective even if it is only for nerds?
Side note, I'll bet this post hits rock bottom like any other post that says something positive about Microsoft.
My work here is dung.
My work here is dung.
i can see it now, GooYahoo
Wulfram II - Free Online Mutiplayer 3D Tank Shooting Gam
This is obviously a tech-community spin to avoid tainting the news from the start.
Like if you were hosting a conference on global peace you might keep quiet about Dubya being a keynote speaker.
These stories are free but worth money.
So who is the fourth horseman of the apocalypse?
How typical of slashdot to not mention that Microsoft is also involved in the site.
Why not just have a link from your main page to an HTML sitemap that links to all pages on your site.
Nice and easy. And usable by people and crawlers.
Lets say I have robots.txt set to deny everything, but I submit some pages to this thing for indexing. Does the spider obey robots.txt or what was submitted? Actually I'd find it handy to keep the spiders the hell off of my site but just submit a couple pages, but I don't see how that system could be trustworthy at all. Is it just me or is this just another form of meta tags?
Microsoft are evil and they suck.
I like Apple. They make nice stuff which works most of the time.
I'm for once interested in a joint project between search engines. If Google and Yahoo! can play nice, and Microsoft is mentioned as being part of this Web 2.0 menage-a-trois, perhaps something interesting will come of this. But right now it does just look like they want to make it easier to index pages. I've been attempting to submit my sitemaps to google for ages and have yet to see my sites listed when searching for my keywords, but perhaps that'll change in the future if this works out. Guess I'll keep my eyes and ears open.
"Pi is exactly 3!" *gasp*
It's too bad that the specification only covers information relevant to search engines.
How about a <description> tag? I would take great interest in a sitemap specification that gives me enough information to navigate major parts of a site with a viewer plugin (of some sort) in a web browser.
There's nothing worse than fumbling around navigating page after page when the web server is slow, the pages are image- or ad-heavy, or the navigation on the page just plain sucks.
Karma: Raspberry Kiwi
Write (or snag) some generic xsl and use wget and xsltproc. All my sites have used xltproc via cron to regenerate a html sitemap from the google sitemap xml since the sitemap thing was launched.
If you're using a text mode browser, this should be obvious...
So how can I submit my sitemap to Yahoo! and Microsoft/search.live.com? FAQ says something about sending a HTTP request to /ping?sitemap=http%3A%2F%2Fwww.y oursite.com%2Fsitemap.xml, but it doesn't say what are searchengine-specific urls to use.
Lukasz
Hikipedia - free database of hiking trails
Lukasz Anforowicz
Hikipedia - a free database of hi
Google is going to take over the world... and if there company record is any indication of their rule, I for one welcome our new advanced search indexing overlords.
Most people aren't thought about after they're gone. "I wonder where Rob got the plutonium" is better than most get.
Get your own free personal location tracker
Many here ask why this is more than robots.txt. For one it offers to add URLs that are driven by databases and parameters. Thing that the SEs do not index too well. It also adds last updated stamps and priority for re-visit.
Why is that important? So if I have one page where I always post the latest news, I can have the spider revisit every hour, so it get indexed ASAP. However the spider can go easy on the rest of my site otherwise. I also can train that spider for a burst, if I have for example an ongoing live event and post results ASAP.
Google was on to this already for a while. however Yahoo's facilities where not so comprehensive and MSN was missing altogether. Now we get a chance to create one single format and have all (big) SEs read it and may be the secondary spiders will catch on too.
I find it only said that they did not incorporate the ROR Resources of Resources RDF framework. I'm also missing a discovery mechanism, such as an extension to the robots.txt or a meta tag (or link rel="...") in the home page of a site.
Another use of this would be to download a full site, as you now know where is starts and what belongs to it.
Overall this is a good thing, where I come from.
Busy helping non technical users of OpenOffice.org - http://plan-b-for-openoffice.org/
The Django web framework added support for 'google sitemaps' over a month ago. Google anounced the details of sitemaps over 3 months ago. Django Sitemaps: http://www.djangoproject.com/documentation/sitemap s/
While it's true that Django has support for the sitemaps protocol and sitemaps are nothing new for Google, Yahoo and MSN's support of the protocol is new.
Call me an XML nazi but the example usage has unclosed tags:
http://www.sitemaps.org/protocol.html
As an XML specification that is likely to be used by people who aren't experts don't you think it would have been a good idea to used *valid* XML in the example usage?
Google has had this for a while now. I had noticed that development has been healthy recently. whereas before it was a relatively unnecessary tool, now it's actually useful.
if it's as useful as Google Sitemaps, then I'm happy with today's news. the protocol does look pretty similar (and by pretty similar, I mean the XML structure is virtually identical). I'm guessing porting Google Sitemaps over to this new one will be painless.
This fall, I released free source code for people to use a PHP Class to generate SiteMaps for Google - and it seems like the standards group adapted Google's format. The code is perfect for dynamic database driven sites that can't readily use perl-scripts that sometimes perform this task. http://www.idealog.us/2006/09/google_sitemap_.html
Geez. First, there were meta tags. Then robots.txt. Then this. Plus proprietary indexing-control tags. Plus ridiculous weight factors (like, for example, giving "title" huge weight index). Plus citation index. How much is enough?
What's the problem? Well, let's see. Most of these things have nothing to do with the actual content of the website. Hence, creating bot-friendly website is not directly related to creating user-friendly website anymore. In fact, this two factors often conflict.
They're trying to sell the sitemap.xml idea, yet they themselves have no sitemap.xml file.
I just mused about the search-unfriendliness of AJAX apps yesterday and how that could be solved and today the big three are banging (almost) the same door. What do you think how could we go about solving the issue?
I misread your post; I'm off-topic. Still, I can't imagine why a content provider would use different priorities. It could only hurt the rankings of some of the low-priority pages. Unless, of course, the search engines gave your high-priority pages an equivalent boost.
Take it from me, me bys, dere's no better place fer a vacation den up e're on de rock where de liquors hard and de sluts are everywhere!
-1 Uncomfortable Truth
Ok, I welcome the new standard to improve search (crawling) quality without increasing (possibly reducing) load on web servers.
However, with the current protocol, a site owner can only send pings to the search engines she knows. Probably Google, Yahoo! Search and Live Search will cover more than 99% now (I am not sure), but it is not very satisfactory since it would block out other minor (or new) engines. (I am the same Coward as #16895296.)
I hope "Sitemap protocol 0.91" will include two additional features:
1. Autodiscovery (like feeds)
2. Some way for a search engine to "subscribe" the sitemap (like mailing lists, not like USENET groups; I am new to feeds and I do not know whether "subscription" of feeds has the same meaning as "subscription" of mailing lists).