Is Microsoft Crawling Google?
triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."
Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.
/. I had to go so far as to point Googlebot at my robots.txt and tell it to remove all the previous links. It was rather annoying dealing with support via email from Googlebot as they have apparently taken on the stance of "we don't care but you should put meta tags in all your files so that we don't index those pages." Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.
Sure, I see crawlers on my site all the time sometimes hitting the same URL over and over again. Do I understand their repetitive behavior? No. Do I care what they are doing? No, as long as they are obeying my robots.txt.
I have complained before about MSNbot ignoring changes to robots.txt while Google happily changed its habbits (I can't find the link sorry). My recent fighting with Googlebot has come to a head when I had to disallow them access to my gallery completely because they refused to honor anything except Disallow:
Do I care if MSNbot is crawling Google and then finding sites and links to search? No as it's none of OUR concern. What is OUR concern is our own robots.txt and how the spiders interact with our sites through that file. Let Google deal with Microsoft/MSNbot if that's what needs to be done but don't concern yourself with it otherwise.
All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.
You mean M$ is searching through somebody else's stuff? Well... I'll be damned...
If not, it's called doing business and gaining an advantage any legitimate way that you can.
I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
The new search engine's name will be Mooglesoft.
Really, since the google search results are public knowledge why wouldn't microsoft crawl google's stuff? If msn search can crawl the web why should it limit itself to everything except google/yahoo? Although this tactic may work the importation of all of googles massive search database might take awhile.
My UID is prime is yours?
Couldn't Google just crawl Microsoft in return? Then they'd be stuck in an endless loop, and William Shatner can then swoop in, crack some skulls, and save the day.
Or something like that.
biffnix
Don't Die Wondering
Doesn't that mean even more results?
I'd do the same thing if I could. This is all "speculation" anyway, but since it feeds the stereotype of the insidious Microsoft, it gets posted front page to this "tech news" site.
Nah, never happens....
100% Insightful
Surely this is just as illegal as going through someone else trash, it is still not your property...
I guess that is the point
I can say that they been crawling like mad as of late, Google, Yahoo, and MSN. I say this because on my site I have had a lot of traffic from all three, and my site is not a popular, or even an important one but I seen a lot of traffic from them. Not just once a week or a few times a week but every day. There are big updates coming. I was not surprised to see the article about google doubling their index, I know something was coming from the way they are crawling unimportant/unpopular sites.
As long as it's legal and helps Microsoft, I highly doubt that Microsoft would be concerned about the ethics of doing such a thing. The author is probably right.
I think Google could have some fun if MS was indeed just screen scraping... I don't think it would be too far fetched to alter results for a certain Microsoft-operated IP.
more evil than satan
ROOFLES!
Such trouble. Just buy the damned company.
Well, that kind of business practice would be completely out of character for Microsoft.
This is a non-story. A good Slashdot headline will be when they get caught actually NOT doing something like this.
Microsoft Has Original Idea and Implements it By Themselves
From the 70%-of-slashdot-editors-suffered-heart-attacks -reading-this-submission Dept.
Google nor MSN "profit" from crawling your site, garcia.
They're taking on Square Enix too? o.0
Can both crawl up my ass.
And who cares what Jamie Crowell (or whoever), random blogger, thinks MSN might be doing, no doubt based purely on "ms sucks" rhetoric?
I don't need no instructions to know how to rock!!!!
"Google happily changed its habbits..."
Google is Catholic?
The Geek Crew
So, what name do you favor for the combined fork and spoon utensil?
Spork or foon?
The Internet is full. Go Away!!!
Doing this for say 100,000 domains would be noticable but would not even scape the surface of what's on the web.
----
The question is why? If they are doing this, are they simply going to present the results as their own, or are they going to work some magic and find the most relevant search results from ALL the engines and use those.
In the first case, it's a slimy business practice. In the second, it's fairly cunning ( and has been tried before ).
In either case, I doubt google is in any real danger. They are to search engines what MS is to the desktop. And while MS has squandered that advantage in the desktop arena ( reader homework: 250 word essay as to why ), google is only improving on their work.
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Yes, they most certainly do profit from the data they have amassed. If they didn't spider sites they wouldn't be visited by the public who wouldn't see their targeted ads.
Thus, they profit from my data.
Why can't Google just block MS from crawling their site? Wouldn't Google notice if other spiders were crawling them?
Maybe partying will help...
It was how Mr Gates learnt to code in the first place ;-)
i dont get it....
But doesn't Google index other search engines as well?
Maybe it's just me, but this beta search engine page renders better in Firefox than in IE. What browser are MS's devs using for their testing?
If you've been watching the logs to your site lately Microsoft has been RAPING most servers. Most crawlers will pick through pages with large lists 1 at a time, then come back every hour or so.
MSN starting last week has been pulling EVERY LINK in sequence from my site. Even the larger Artist Index pages of my site.
Seriously, I've had this same spider on my site for about 36 hours now.
Next you thing you know, MS will be suing Google for IP rights to their cache. That, or buying the cache from them for $50k and then sucking to IBM.
Read the only personal Runyon page out there.
Microsoft could always have the google queries come from the user's computer, and integrate the results on the user's computer before displaying it. This would be impossible to block with IP address, but may be blockable with some sort of query heuristic. I'd think this could be done with Java or ActiveX pretty easily (I'm more of an embedded programmer...)
HIV Crosses Species Barrier... into Muppets
After reading the article all of this is based on one result and a bit of speculation. However, if true, I would hope Google quickly finds a way to block this.
What would be funny is if Google could detect when it is Microsoft sending a query through their system and return random results. Or return 5000 results all of which are redirects back to the MSN search page. And of course, Microsoft can't complain about such a thing because in doing so they'd admit they're trying to use Google's results.
I wonder how long some of the less intellegent MSN users would spend at the search page clicking on links that redirected back to the MSN search page?
The claims are so absurd I don't even know where to start.
1) His whole theory is based on the "fact" that the only way in the world to find his pages is to use site:www.sitename.com in Google, implying that Google has cached the results from an earlier crawl. Of course, there is no way that the Microsoft search couldn't have also cached it.
2) Then, he claims that Microsoft is probably screen-scraping Google's results (for all the millions of sites out there), and using these results to recrawl those sites? This doesn't even make any sense.
3) And last but not least, Microsoft is certainly basing its whole search architecture on the assumption that Google wouldn't ever notice MSN mirroring its whole index. Yeah right.
When men used to be men
Google: Do No Evil
Microsoft: We'll Decide What's Evil, Thank You Very Much
I want to delete my account but Slashdot doesn't allow it.
You're likely to piss someone off. Like the mayor.
If you copy his work without permission, you've already committed copyright infringement -- so yes, you violate the TOS by default.
Comparing this to the MS/Google situation is not the same so the grandparent post still stands.
It's almost refreshing to see that the Internet may well be catching up to television...Media maturity at last!
... the MSN search beta that I saw stole everything from Google anyways.
The user interface originally looked like Google. They clustered commodity PCs in the same 'shard' configuration as Google. Their ranking algorithms considered links like Google.
They have done nothing innovative, and they are continuing to chase taillights. Let's hope they don't catch up.
My website is the #1 site listed with specific Criteria on Google. Consistently for the last 2 months. I try the same thing with MSN search and My site does not even show up at all.
If they are searching Google, they haven't done it recently, or else they haven't gotten to my site yet.
It would be easy for Google to insert a small fraction of non-sequiturs in the results, look at Microsoft's search results, and then sue for misuse. Even if MSFT uses random proxies to avoid detection, it cannot manually recheck all the hits to make sure they are correct (if they could, they had the resources to check all the sites, then they not need to crawl Google. A few made-up sites or inappropriate search hits would be enough to establish a pattern of abuse.
Two wrongs don't make a right, but three lefts do.
I might be mistaken, but I thought google has a 10,000 query limit per IP address per day. So it might be conceivable that enough computers over several days could get it, though I imagine it wouldn't be trivial
:)
I think this is mentioned in Google Hacks by O'Reilly. Those with an online account there can check it out and mock me if I'm wrong
my last sig was too controversial... now, a new and improved useless sig!
search google for site:google.com
The fool is trying to con you.
I see bots hitting a cgi test set-up forum I ran 2 years ago (before uploading to remote ISP) STILL try to index pages. I think the bloke is spot on with his analysis.
You can't get to every page on the internet just by starting at one page and recursively following links, therefore the more places you from, the more likely you are to have 100% coverage.
I could imagine that Microsoft just needs a few thousand URL's evenly-spread across the internet just to seed their crawler, which they can get from Google by using a list of most popular queries.
Once their crawler has so many starting points it can do the rest itself.
Someone else can see thru garcia's whining! Hey garcia: It's the internet. Either manually block the bots or STFU.
They could just be comparing results between the two engines... for testing purposes.
It's called a router. It can be set to null route whole chunks of IP address space. Set it to forget where Microsoft is and forget it.
Anybody know what IP address ranges msnbot is using? Might be possible to limit the rate of connection from those addresses using firewall rules (or, for that matter, forbid connection entirely if that's your preference) to avoid the "hammering" that msnbot is said to be doing...
Hacker Public Radio is our Friend
that article was so ambiguous..."some person was searching some site and it was being spidered by some MSN bot and the links were added sometime after"
Yay way to go slashdot thanks for posting the most blatant flamebait article ever - how about for your next post, you repost that routers article about a machine that makes more energy than it uses....
Ave Molech Setting
for the search argument - linux
google - Results 1 - 10 of about 203,000,000 for linux [definition]. (0.22 seconds)
msn search - Web Results 1-9 of 28,254,249 containing linux (0.19 seconds)
it looks like the m$ search is just a toy
I just grep'd my access log file to see what I get...
I get very similar results as the article discusses, except with the IP 65.54.188.149.
grep for 65.54.188... lets see what others get.
And got banned from using google. Seriously.
Please Slashdot this link.
Isn't Google a webpage? Is MSN doing anything wrong by indexing a webpage and it's subpages?
Look at it this way. If Google were to complain about someone searching their page/databases, they would be the largest hypocrites in the history of history.
0110100100100000011000010110110100100000011000100
The author suggests that microsoft must be scraping google b/c the only place _he_ could find the URLs they're requesting was google's cache.
Uh.
Microsoft has been developing their internal search engine for quite a while now. Part of developing a search engine is using it to crawl and creating a large corpus of test data. It's hugely likely that M$ has had a working crawler system for much, much longer than would be indicated by their public announcement. Quite a few people who helped develop Altavista at HP/Compaq/DEC research joined Microsoft Research about two years ago - the kind of people who could write a high-performance crawler in their sleep and wake up feeling refreshed.
That article seems like baseless, uninformed speculation, to put it not-so-politely.
I'm certainly no Microsoft groupie, but this behavior may not be as sinister as it seems. Afterall, Google is on the internet, too. There are links found all over the internet to Google, with some specific search term embedded in the URL. If MSN's bot happened upon a link to a Google search page, is it somehow wrong for the MSN bot to follow that link, and spider as normal?
Visit the Game Programming Wiki!
Try entering a known Googlebomb into the MS search engine. "litigious bastards" shows up www.sco.com as the number one hit.
Microsoft is assimilating Google!
modulate you phasers!
go into the holo-deck and get a tommy gun!
run!!!!!!!!
-LeaV
I own a pump action golf ball cannon. I made it myself.
Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."
The analogy seems a little wrong in this case. You throw stuff out because it's trash that you don't want anymore, thus you relinquish all rights to it being yours in the first place.
What Microsoft is doing could be likened more to plagerism, since google is doing the work and Microsoft is passing the results off as their own.
Microsoft's beta search engine's index doubled in size to over 8 billion pages.
SIGFAULT
See the results
i lure&spell=1 l e+failure&FORM=QBHP
http://www.google.com/search?hl=en&q=miserable+fa
http://beta.search.msn.com/results.aspx?q=miserab
So I can see how you can distill the entire content of the web that your bot has crawled into a database, but is it possible to pump enough queries into Google to get the entire database? (Or in more mathematical speak: Is this a well posed inverse problem?)
I don't think so. You still have to have your own crawler (to use on the top ranked results of any query). And a good set of queries to hit google with (so you have an idea of what to index)...which changes constantly. Look at Google's zeitgeist some time (link left to some karma whore...)...
Not sure if it's crawling google, but it crawls. Can you say, "sloooowwwww"? As with pretty much everything MS offers, it's a POS.
Well several people have mentioned that google simply need to block the IP ranges of the MSNbot.
:)
Why not have some more fun like what people have previously used to stop evil hotlinkers.
Some lovely mod_rewrites to goatse would go down an absolute treat, imagine the faces on the MSN staff after coming into work expecting to have leeched a stack of google results to find their cache filled with the world famous goatse images
So I do a Google search. The Search has www.mysite.com/someoldurl.html still listed, but that URL is gone. I click on www.mysite.com/someoldurl.html. I get my custome 404 page, possible - or a redirect.
This information is then stored in refer logs. Also, if I am using IE search, when I plop from one result to the next, the second one will see the first plop in the referer info. Not the IE search.
Some folks make their logs public, which Microsoft could crawl to find these links.
Just one possibility. I highly doubt they are screen scraping Google results.
Yup, I searched for my name, and I found it in lots of keyservers, that show all the e-mail addresses of the people in my keychain.... say hello to SPAM =)
http://www.google.com/technology/pigeonrank.html
unless the author of this sensational article reviewed their httpd logs for the user agent 'msnbot' clear back to 2003, they have not ruled out the possibility that microsoft's spider simply crawled the site in question, before msn search was a tech news feature. brett tabke's webmasterworld forums mention sitings of msrbot from microsoft in april 2003, and widespread msnbot activity starting december 2003. its also possible that microsoft seeded their search index by licensing it from a comparable index source, e.g. the alexa crawl.
about sean dreilinger
Hey Google, please don't make us read those wacky JPG/GIF letter scrambles with criss-cross lines and input the random characters into a field before submitting a search.
"Hold on a sec while I Goog- Huh? Grrrr.... H... P... 7... O... wait no, 7... zero... ummm...
This one gang kept wanting me to join cause I'm pretty good with a bo staff.
"It's" is short for "it is". Don't use it by mistake when you mean the possessive sense: "its".
It's true that Slashdot and its users are prone to spelling error.
This article is an example of why blogs are worthless ... He never thought of *asking* Microsoft, did he?
Add this to robots.txt
/
User-agent: msnbot
Disallow:
I will fight Micro$oft efforts to monopolize another area of the tech industry (to its detriment)
Google: Don't be evil!
Microsoft: Greed is good, greed works!
This isn't surprising. They steal from competitors all the time.
1-Crawl 2-Cnfg 3-ATF 4-Exit ?
This whole article is based on the speculation of a web master who notices that a bot which allegedly isn't leaving behind a bot name is crawling his site. He then figures out that, oh look, there is a standard record in his server log.
And I'm supposed to take this clown's "friend" seriously? That's not a good start, anyway.
But then there's the real howler: the site can allegedly only be found through site: on Google. How does the friend know that? Has he done a complete crawl of the web to find all forward links to any image in his site -- even broken ones? MSNBot, like all bots, recognizes that many anchors are broken, and tries plausible corrections around the broken links. That's particularly useful with a deep link, where the deep link may have timed out but the shallow link still exists.
Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it.
It's official: 14 year old AOL'ers are now known as "IT journalists."
Call me a literary pedantic, but I don't trust much journalism that includes "LOL".
The old Lie: Dulce et decorum est Pro patria mori
The author admits his conclusions are based off a grain of salt. This article is more like a conspiracy theory than it is news.
William Shatner can then swoop...in, crack... some skulls, and save the... day.
(Followed by sleeping with the green-skinned alien slave girl)
It's interesting to know that Bill Gates has been forced to go back to his roots...
But it's ok for Google to crawl and link to news sites, which is already a legal grey area.
What's the difference?
Microsoft is using all the windows boxes around the world and recovering all the results of the searches done in Google by the machine's owners !!!.
;)
That's another reason to leave windows
All theemail addresses a spammer cares to swallow.
Now I'm the grandest Tiger in the Jungle!
Coming soon, on the main Google search page:
Google Search - recommended by Microsoft!
#DeleteChrome
hahaha
What the fuck is this? Take a stab at Microsoft with no evidence what-so-ever? Isn't this more commonly known as slander? I can't believe even Slashdot published this nonsense.
Yes this might sound like a rant, but somehow (partly my fault), the MSN Spider bot found one of my joke cgi scripts that translate pages to my own imaginary language. It's linked nowhere on my site, and maybe 3-4 places on the entire web. Said MSNBot began to pull PDF after PDF through the script, in addition to other large files, it also tried mailto: links. All in all said spider pulled about 1GB of data in a single day. My site's previous average was about maybe 300-400MB a Month. Let's just say that entire M$ IP Netblock was quickly filtered through iptables.
Your hair look like poop, Bob! - Wanker.
Google keeps track of IP addresses and blocks which are doing an unusually high number of searches and disables requests from them.
How do I know? Because a friend of mine decided to find out how common all TLAs are (three-letter acronyms) by counting Google hits on each TLA. This was before the Google API, so he did it with good old fashioned HTTP/HTML. It didn't take long for Google to flag him as evil and block access from his IP block.
Sure, Microsoft could find some way around this-- using different enough IP addresses to conceal the source-- but that's more trouble than it's worse. Worse yet, it sets up a cat-and-mouse game and keeps M$ dependent on Google-- when their stated goal is to beat Google at its own game.
I've got a simpler explaination for what the author is seeing. His evidence is based on the fact that some pages being requested exist only in Google's cache. Well, spiders are supposed to do breadth-first searches so they don't hit the same site too often. Microsoft is probably going against data it collected a few weeks ago but hasn't put on its public servers yet. (Why not? Could be lots of things. Maybe they haven't put enough hardware on the front end to support the amount of data they have on the back end. Or maybe they're just slow.)
As much as I'd like to bash M$, there's nothing here that really looks suspicious to me.
It's also clearly still a BETA product
searching for habs@panix.com my email since '89 turns up NO results, but the following string without the quotes yields resutls
"habs panix.com"
http://www.hawknest.com/
I say let 'em. If the best Microsoft can do is bite at the ankles of the big dogs, they won't last in this area. That being said, eventually Microsoft will surpass Google, if only because they have an infinite amount of resources to throw at the problem, if google ever stops to rest on their "laurels".
*Condense fact from the vapor of nuance*
I don't own a PS or a PS2. However, I do own a Super Nintendo (: Jackass.
FORTUNE FAVORS IRONY
Now we know the real reason for all the Windows/IE security holes.
MS plans to turn the universe of trojan/worm/virus infected machines into Google surfing, MS search enhancing zombies.
You bastards!
Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own.
My garbage doesn't have a copyright statement, contain my patented technology, nor does it come with terms of service or licensing agreements.
While it is hilarious that SCO takes #1
This isn't by accident.....
http://www.litigiousbastards.com/
Mod points are pointless when you browse at -1.
1. Almost none of my googleADS receive valid hits from GoogleADS content.
2. My content ads spiked recently.
3. MSNbot started crawling like mad at about the same time.
RESULT: Make your own conclusion, but it appears MSN doesn't care jack about the little guy.
There's no place like ~/
Every time I try to do a search all I get is "The server is currently unavailable" Not a good start if you ask me... :)
Google image search: Ugly
Result: some seriously ugly people
MSN image search: Ugly
Result: no ugly people!?!!
Google search: (my IRL name)
Result: all of my USENET posts, my blog, all my sites
MSN search: (my IRL name)
Result: some of my USENET posts
Speed: Google kicks the living yehaa out of MSN.
Result: MSN sucks.
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
I know he says to take it with a grain of salt but I don't buy his evidence at all.
He says that only pages google returns are getting hit. This only means that google is the only one to actually return those pages. It is perfectly reasonable the MSN has pages that it doesn't return but might use in future crawls. Maybe it realizes the pages don't exist and therefor doesn't return them but tries a couple times to see if the page comes back.
As for pages showing up after someone searches them, maybe they spider specific sites that are searched for that it doesn't have. It's called lazy loading and has been around forever.
Maybe MS is benchmarking itself against google, though I'm sure if they did it with any serious amount of queueries google would complain about the bandwidth hit.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
which inexplicably renders horribly
Hardly inexplicable if you look at the HTML source for Slashdot, it's an unholy mess.
Ok, so maybe this will be a dumb comment, but I don't mean for it to be a troll...
Why doesn't MSN license Google's engine? I mean, certainly they could work out some nice licensing terms that would avoid putting a link to Google up on their page. They'd still own the eyeballs of probably the majority of people who install Windows and just fire up IE. MSN ads could still be served up to those people, and they could still be directed anywhere that Microsoft wants them to. This saves them the development expense not just for the search engine but also the support infrastructure...
Anyway, this I think would definitely be in Google's interest. Why not in Microsoft's?
Why block them? Just reverse its returned results for any MSN site. Call it anti-leech-technology.
My beliefs do not require that you agree with them.
Is Microsoft finally Crawling down the Gurgler?
Oh, never mind...
Maybe next year
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
So what?
Bot Assisted Blogging
Great. Another problem that can be solved with an .htaccess file. Woo.
I tried searching for "Hello Project" on MSN's search yesterday and got very few of the results I was interested in (a JPop conglomeration called the "Hello Project"). Strangely today, when I tried MSN's beta search, the returns were nearly identical to what Google's were. Too identical to be a coincidence in my opinion, especially since there aren't going to be very many people searching for the Japanese "Hello Project". -Aaron-
Sure Google has a super huge index of the internet but Google uses DMOZ as a starter to get its bots going. Now I would expect that it probably doesn't have to do this anymore since it can rely on its own data and people now submit links directly to them. DMOZ's directory is freely available for download so why wouldn't M$ use that as a jump off point?
Specks
Batteries not included
...crawling up your a*s, Bill Gates will never stop!
"Well since Microsoft has patented TCP/IP it would be obvious who they bought the Internet from."
Think of the ensuing amusement as MS secretly funds SCOogle in order for them to sue the patent holder of TCP/IP for infringment.
Suddenly MS immolates in a blazing ball of fire as $40 billion goes up in a spectacular lawyer-fanned pyre of suing and countersuing themselves into oblivion with all the invevitability of a spent massive star collapsing in on itself.
Why can't google similarly choose to present false or misleading results when queried by their spider? Indeed, it would be extremely fun if we can finally get cowboy niel and goatsx the page hit rating they deserve on msn!
If I were Google, I'd just find out the known IPs that Microsoft crawls from, and then have a special script waiting that will provide all kinds of fake URLs with randomly generated gibberish content in them. That should destroy the quality of the Microsoft search results really quick.
Let's spike the results ourselves and submit each "spiked" page to only one search engine so we know which engine copies which other!
This would help us weed out complete parasites.
Microsoft is pure dog-ma. FreeBSD is pure cat-ma.
Much like how metacrawler work(s|ed?). I found that quite powerful in it's time.
My recent fighting with Googlebot has come to a head when I had to disallow them access to my gallery completely because they refused to honor anything except Disallow: /. I had to go so far as to point Googlebot at my robots.txt and tell it to remove all the previous links.
If you do want to be in the Google index, you could write a script to turn robots.txt into a set of RewriteRules and match on GoogleBot. Yeah, a bit of CPU on your part, and unfortunate, but perhaps a reasonable compromise.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Cartographers used to do this, so they could tell when people would steal their maps. An extra fake street here or there...
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Search Engine Chart
Have you ever been to a turkish prison?
Search engines sell their results all the time. One engine I'm familiar with actually pulls from several other engines when you run a search.
Google and MSN probably have a deal worked out.
Assume I was drunk when I posted this.
General Magic developed an animated desktop for their OS, MagicCap, which used over reaching 'object' and 'place' metaphor with cartoony characters.
Those of us involved with MagicCap at the time thought Bob felt derivative and even less useful.
So keep looking for that Microsoft innovation...
Think about it. It's a classic MS idea!
MS, instead of doing their own work, gets some scripts together to parse google searches into "MS" searches, and they declare a profit, celebrate, watch the stock price rise, and go home happy.
When Google figures out how they're doing it, they get their lawyers, MS gets their lawyers, they meet in court, they settle, MS still makes a profit, Google gets a little something for the effort, and MS still comes out ahead.
20 years of this behavior, and people haven't figured it out. MS put the screws to WordPerfect for 10+ years, and we all saw the settlement from a few days ago. $536 million? That's chump change when you figure out what Office made in the past 10 years.
-- No sig for you!
Perhaps this http://www.google.com/microsoft.html spiked them?
I used Win2k to run Firefox to view MSN Search and searched for Google. Now my PC keeps making sheep noises.
A man decides he's going to set up his new dry-cleaning shop next door to a nunnery, so as soon as it is finished he goes and asks the Mother Superior if she has any dirty habits.
Enjoy.
"Dowell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."
Yeah, that's very well true, but doesn't it make that person digging through your trash seem stupid and poor? I'd think that it would make Microsoft look the same. And a bad reputation for a company like that may not hurt them (since they already have one), but it sure can damage their little project (the search engine).
"Instant gratification takes too long." - Carrie Fisher
...specifically, the map-making industry. Every roadmap you look at will have at least a few made up roads or landmarks that are used for the sole purpose of catching other companies copying their maps verbatim.
I don't respond to AC's.
Actually, in civilised countries it is illegal for anyone, except for the actual garbagemen, to even touch your garbage.
Google offers a limited-scale usage model to "retail" customers (end users) for free. MS is allegedly consuming Google at a larger scale, for "resale" to end users: that's "wholesale" info consumption. Either Google will limit MS to retail use, like everyone else, or they'll let anyone mine their expensive index. Hopefully they'll stay competitive by exercising the latter, possibly by countermining MS, and improving the Google index in direct comparison to the MS quality. That would be like P2P server competition - sounds like a winner for consumers at every level, except MS.
--
make install -not war
MSN Results
Yahoo Results
Noticed they are the same. Why was this article even posted.
Have you ever been to a turkish prison?
Go to the MSN search engine and type "bill gates is an idiot" (include quotes). Then try the same in Google .
And MSN crawling Google's site is really no different. As long as the Google data is on a public server, it is fair game to crawl.
visit their new search engine and type in: the best operating system, then hit search :-)
microsoft is looking at old pages, google uses a cache...ergo microsoft must be using google.
if we're going to use that kind of logic, I could just as easily come up with "afghanistan is in the middle east and supports terrorist, iraq is in the middle east...ergo, iraq must support terrorists", and use it to make a case for invading iraq...but you don't see......oh wait
It would be absolutely unnecessary for M$. All they have to do is use some of the existing web directories like ODP, Yahoo, Skaffe, JoeAnt, etc as seed and they can find their way to a lot of webpages.
Even if a spider is scanning URLs that were obtained from Google, that doesn't mean that it's being done institutionally by Microsoft. It could just simply be someone fiddling with something.
Hopefully Gooogle has a good licence, which allows them to sue MS into bancruptcy.
But then garcia would have nothing to bitch and whine about. All his post is really doing is to try to get people to click to his site.
yep
Maybe yours doesn't - but theirs did.
-Looking for a job as a materials chemist or multivariat
so if the msn bot does what they say it doesn't do what it's supposed to do.
yep again
Retaliation from google in 1999?
Search for "more evil"
e vi l&FORM=QBRE
http://beta.search.msn.com/results.aspx?q=more+
First result: Microsoft's home page.
...you'll be indemnified. It's cool.
Cheers
Stor
"Yeah well there's a lot of stuff that should be, but isn't"
Afterwards if you suspect copyright infringement (as in they used you as their source, rather than going out and doing all the original work you had to in the first place), you take them to court as follows:
Defendant: No Your Honor, we did not use their copyrighted material to write/draw our own encyclopedia/map. We went out to the same original sources as they did.
Prosecution Attorney: Then pray tell us where you found the original data for this particular article/street.
Defendant: Uh...
Judge: Guilty!
Google has created a new, original derivative work in the process of how they created and organized their database. They should not be required to open it up to known competitors in the process.
I do really wonder however how Google can't be aware of Microsoft IP addresses recently accessing large chunks of their data. Or is MS using stealth IP's not directly registered to them?
Do you suppose Google itself has a robots.txt file?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Uhm, that's assuming that MS is directly querying Google.
Who says that they haven't put a transparent proxy in front of unknowing MSN customers, capturing the results of those customers' queries. Not terribly difficult to implement and as a side effect, you are looking at a set of search terms that are most popular with your users--ones that MSN Search may want to improve.
Microsoft would NEVER do anything unethecal! Anyway, 300 websites should be enough for people to search from.....
-- "Life's not fair, but the root password helps."
I typed in "lostpacket" (my domain) and the first thing that pops up is a slashdot comment I made
... Insightful) by koan (80826) on Thursday September 16, @02:50PM ( #10269238 ) ( http://www.lostpacket.net/ ) Thank you M$ you just gave me the "final straw" to migrate to Linux.
"
"
"If any question why we died, Tell them because our fathers lied."
what google has to do is:
... randomly...
run test queries no one would ever do on msn, check their logs for IPs / masks
then write some code to:
1) send results to any query coming from these ranges properly, but with the url link changed to
goatse or tubgirl/etc/etc....
imagine the news:
"msn search was hacked! shows "tunnel man" as result for searching "the vatican"
2) Have an internal google staff meeting on a big room with a huge tv screen showing the news channels...
3) Laugh a lot
4) profit!
I for one, welcome our new hot grits... PROFIT!
We're talking about the beta of their new ones, which uses homegrown technology.
The current search.msn.com is still using Inktomi technology (same as Yahoo).
But beta.search.msn.com has the new stuff.
http://www.google.com/catalogs?q=slashdot Google Catalog search...
Sounds an awful lot like PageRank, doesn't it? It's not as if they've ever succeeded by innovating.
I can write a search bot today that completely ignores it and there is nothing wrong with that (except perhaps ethically but even that is arguable)
"Ethically" is the whole point.
You could just as well have said, "I can code an RFID key that will gain me access to your hotel room and there is nothing wrong with that (except perhaps ethically but even that is arguable)"
Hell yeah it's arguable, that's exactly what this story is all about!
MSNBot over the last year or so has started pulling content at a rate so fast many webmasters on PAYG bandwidth have had to block the crawler.
Also when I complained to the research guys I got a reply back dead quick about how my virtual hosts were misconfigured and one domain wasn't configured with a robot.txt. This is compared to the normal response from a microsoft monkey after 2 weeks saying "please try to reboot" that you normally get.
My guess is that Microsoft are throwing money at the new MSN search and have a good research team on it. Assuming that the archiving/indexing is working well MSNBot is now indexing much faster than Googlebot and will catchup or overtake Google if it continues like this. Perhaps it is a key part of the Longhorn-search integration?
I'm sceptical of the MS search. Try a search for x-windows...
Doesn't Google has a policy or some sort of thing so you cannot use their results for your search engine?
What else can you expect? Microsoft invented the "embrace and extend" strategy and they are simply applying that to their search database by scraping Google and tacking on their stuff.
Where it gets sinister is when they appear to "steal" the content. What if the Google page has meta tags or a robots.txt rule specifying the page should not be indexed? What if the terms of service for Google do not permit the unauthorised re-branding or other use of their search results (they do in fact)? Wouldn't you say it's wrong for MS to send the MSNBot a crawlin' then?
I dunno...maybe there isn't a stong defence for Google, but if this is true somehow it seems unethical to build a business on the heavy investment of time and money of others against their will. If they wanted to build on others' work they should use Open Source material like Google did in developing their technology. If they need content they can hunt for it themselves, or get it from a similarly free source (embrace and extend DMOZ directory for example).
I believe in the philosophy of "Free software" and all but I also think that in general you should respect the wishes of others and Google I'm sure wouldn't like this. I also think it would be a bit hypocritical of MS if it did this given its position on sharing its own IP.
Copyright is for CREATIVE works. An automated compilation of other people's works doesn't qualify for copyright protection.
What you say goes against a lot of things previously held by companies about search results, notably that the treat them "hands off". In other words, if Google says they put enough effort into creating their search results they also then can be held responsible for the content. Instead, Google takes the other tack, that they only return links to other people's works.
They aren't 'doing as they please' with your content.
They are linking to your content. No one is stealing your work.
In addition, Google has nothing to say because databases of public information cannot be copyrighted. Their search engine is like a phone book. They may have a case under 'sweat of the brow' but since a spider when out and did the work for them, or the site was submitted to them...
The only other thing they can argue is trespass, and it's obviously not that since it's wide open to the public.
Reproducing a work in its entirety goes way beyond fair use. I was talking about the Google cache, if you bothered to read.
Of course they aren't stealing it, that's not possible, it's not property. However, in the specific circumstances we were discussing Google would be reproducing protected works in violation of the legal protections afforded by copyright law.
Chernobyl 'not a wildlife haven' - BBC News
Why is it that the image search on new msn search when set to turn off filtering, yeilds less nudity and explicit material (almost none) than the google search?
MSN
Google
FF 1.0 PR and all lower versions have never had trouble with Slashdot. And I'm not new here.
I am using it on Linux, however.
It's amazing how a group that calls it self "NERDS" are hating on a nerd himself, who nerdily successfully created what all of you nerds strive to create: A monopoly in the form of a company that provides software that everyone in the world has and uses and/or will eventually have to use ...
... I don't get it?
Yet you continue to bitch and moan about whether or not Microsoft is stealing this, or isnt releasing that, or Windows is buggy (which I haven't read since XP was released!)
What if Firefox manages to take over the world and successfully destroy Microsoft in 5 years. Will you then jump on the Microsoft bandwagon?
Make your minds up. If you're truly nerds, you'd be supporting Gates.
The stolen trash analogy doesn't hold up. This is more like going down the neighborhood and picking up all the plastic bags left out for Salvation Army, DAV etc. and putting it all on your porch. It's not stealing, but it's still pretty skeezy.
I'd like to share these few entries from my website, which is getting raped by the msn bot.
2004-11-09 15:17:56 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 14:25:37 permit-1.0.tar.gz 207.46.98.33
2004-11-09 10:32:15 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 06:25:07 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 06:19:18 permit-1.0.tar.gz 207.46.98.33
2004-11-09 02:51:34 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 02:46:07 cdp-1.0.tar.gz 207.46.98.33
2004-11-09 02:35:36 MultiplyWithMFC-1.0_src.zip 66.249.64.199
2004-11-09 00:55:05 sync.X-1.0.tar.gz 207.46.98.33
2004-11-09 00:48:03 permit-1.0.tar.gz 207.46.98.33
2004-11-09 00:10:57 permit-1.0.tar.gz 207.46.98.33
2004-11-09 00:05:21 sync.X-1.0.tar.gz 207.46.98.33
2004-11-08 21:03:10 permit-1.0.tar.gz 207.46.98.33
Note that 207.46.98.33 is registered to Microsoft so lets assume it's the msn bot. Notice that the damn thing blindly keeps on downloading the same file! The requests are just a few minutes from each other. The other ip address is from google's bot.
[alk]
In my experience, the MSN Search results changed significantly within 10 hours of my completing a search: First search test. Somebody tries the same queries 10 hours later. I'm not sure what to conclude from that, but clearly some magic is happening behind the MSN curtain.
beef up it's new search engine
"its".
I disagree over legal remedies. Since Google's search is offered for free - with no consideration - then the Terms of Service can't be construed to be a contract. A contract requires offer, acceptance, and consideration to be legally enforced. We can agree on anything - "Hey let's meet for lunch, at noon, by the water fountain." And there's nothing I can do if you don't show up...unless I pay you, or offer something, then you would be in breach. Copyright law protects ideas not the actual piece of paper or the publication. Google's idea can't be duplicated without permission. What exactly is Google's idea? Plus, then one could argue that Google is making money off of everyone else's copyrights without any royalty. Nobody goes to a search engine to just look at results. They want to see the information behind those results. The idea of a web directory or search engine isn't protected. Google's search engine software is copyright protected. But the results of a copyright protected tool don't neccesarily lead to another copyright. Example: I use Microsoft Word to publish a document. Microsoft's copyright does not extend to my document's ideas, but only to the underlying file format. I tend to agree with the phonebook analogy for this case.
when I first became an editor of the Open Directory Project http://dmoz.org/ I soon ran out of sites to add to my category. So I went over to Yahoo and stared adding sites that they had...though I didnt copy the reviews cause that wouldn't be right, apart from copyright issues.
Don't see anything wrong in using other peoples work to build on.
And he's got their briefcase? Probably recounting the cash before he gives them the papers. He smartened up, that Al Gore.
actually if anyone has actually tried to check out the UI, its not bad. The advanced search is not on another page (which is a click and so painful) but a javascripty thing called 'search builder' on the same page. Of course it takes ages to download on a shared dialup but good thinking by MS. (there goes my karma)
Type "fuck microsoft" into Microshit's engine and then type "fuck google" into Google.
Which search engine is better? You decide!
---Technology will liberate us if it doesn't enslave us first.
I am the richest and I don't want any one else becoming rich. Nor do I want them doing philanthropy.
Amen
You obviously haven't understood what he's claiming. He's not saying that MSN is screen scraping the results - only that it may be using sites found in google as a list of URLs for the MSNBot to crawl. They are still doing the crawling themselves and building their own index.
Mod parent down.
Microsoft is out to get me! Stop them! God, I get so sick of the whining here about MS and how they are out to subvert destroy and just fuck everything up. I don't like the compnay or it's products, but god, I don't have dillusions about them being an evil empire that is out to get me. Stop the nonsense. It makes those of us who have a shred of common sense, but still don't like MS look foolish along with the rest of you. Stop the stupid finger pointing and use some sense. MS is a company. They make a product. They want you to use their product so they have to make one better than the next guy to do it. I am sorry to inject some reality here, but they they have been making a better product in the last decade than anyone else. That's why they are so big now. Let's spend our time trying to change that instead of jumping up and down screamming like a bunch of freaking idiots and shaking our fingers at them. That isn't going to change anything at all, except to get others to stop listening to us altogether. Let's show hem progress and they will come. No one wants to be in the same room with a bunch of fanatical screamming morons.
It worked on me! I am crawling it and completely ignoring his robots.txt WOOOHOOOOO!