Definitely lots of crawling. From my logs, I see that:
GoogleBot crawls me extensively daily
Slurp (Yahoo!) does it daily but only a few pages
msnbot is like slurp
Exabot does an extensive one every few days (this is fairly new for me)
And of course I get the occasional random crawl from some other bot I've never heard of. But Google is by far the most consistent and the most extensive.
Absolutely. This is why I always tell people to "think like a librarian" when it comes to finding information in a search engine, whether it be Google or not. That said, I don't know how much is being taught about libraries and library organization these days, so maybe that's a meaningless thing to say.
Just curious, but are you intentionally targeting Mozilla users to convert them to Firefox
No. If you read the How to detect Firefox page, you'll see that I kept the test very simple: just look for the string "Firefox/" in the User-Agent header. Trying to distinguish Mozilla from Firefox takes more work, but it can be done. Perhaps for most people it's more a matter of targeting Internet Explorer users by testing for the "MSIE" string.
Ugh. It's one thing to put up some text and some links promoting Firefox, it's another to annoy the user with popups and other spam-like tactics. What about users who are forced (by corporate policy, for example) to use IE? Be gentle, please...
Now be sure to change your web pages to detect non-Firefox browsers (or at least non-IE) and encourage them to upgrade to Firefox. I've documented the basic technique here: How to detect Firefox and See the headers you're sending.
Google doesn't index as thoroughly or as often as Yahoo, a search engine that's trying very hard to increase their search capabilities and that includes image searching.
Hmmm... funny, but I have the opposite experience. Google is constantly indexing my site, but Yahoo! just slurps a few. I think the session IDs in the URLs throw it off, so I should fix that. On the other hand, MSN's bot has been pretty good about indexing the site, though not as regularly as Google's. And lately there's been some other bots showing up that I've never seen before -- more proof that the search market is heating up.
There's still lots of room for improvement. You still have to wade through a lot of crap to find what you're looking for if you don't know the exact search terms to use. That's the
skill you need to hone for effective searching. Think like a librarian!
Google's servers are probably too busy indexing text to spend much time on images. A couple of weeks ago I started setting up a Vioxx information site and I submitted my URL to Google for indexing, not expecting the pages to show up in the index for quite a while. The GoogleBot made its first appearance one day after my site went live, and it showed up in the index just a couple of days after that. I bet they're just not devoting horsepower to it trying to keep up with the normal text stuff.
If you're an application developer, first consider writing the apps using J2ME. This isn't always possible, I know, but if you can do it in Java then you'll have compatibility across a wide variety of Nokia (and other) devices.
Speaking of Amazon, yesterday they unveiled their new Simple Queuing Service, their latest foray into web services. They're exposing some of their infrastructure in order to let you share data between distributed components. Free for the time being, though limited in terms of how much data you can queue at once.
Of course, someone will probably sue them over this, too.
No. Consumer DVD burners cannot burn the CSS key data required by the studios.
Yeah, but I wish you could. Or somebody could. I mean, when you go to BlockBuster and they're out of the movie you want, why can't they just burn a new copy on the spot for you to rent?
Probably because hard disks capacities are so large and DVD burners are now pretty much standard equipment on PCs. There must be a corresponding increase in movie pirating, critical mass must have been reached.
Me, I wish they had a "burn on demand" (BOD) model where you pay a minimal fee (think rental cost, ideally cheaper) and get to burn a movie on DVD. No case, no extras, just the movie.... I guess video-on-demand is almost the same...
You want to feel old, talk to some current university students. My wife is a part-time marketing lecturer who finds that the students rely on the Web to find information. (Because, as you know, if it's on the Web then it must be true.) They simply have no concept of a world without the Internet, which makes dealing with pre-Internet business cases very interesting for them as they try to figure out where to get the information they need.
I remember watching someone try voice input on their TRS-80 (this is going waaayyy back)... "One... no... one... no! One! NO! ONE! AAARGGGH!" I know things have improved since then, but I'm not sure how accurate you could make it on such a small computing device. And the problem with voice input on a phone is that it's too easy to throw the phone against a wall after it misinterprets you for the hundredth time:-)
There's little incentive to withhold information, really, because I doubt there are any real "trade secrets" to worry about. Many tech books are written more as a way of increasing the author's (or his company's) profile in the field. If you're a consultant, it's another way to get leads and to impress potential clients. You don't do it for the money, trust me...
Definitely lots of crawling. From my logs, I see that:
And of course I get the occasional random crawl from some other bot I've never heard of. But Google is by far the most consistent and the most extensive.
EricWhy the Vioxx recall reduced spam (humor)
Absolutely. This is why I always tell people to "think like a librarian" when it comes to finding information in a search engine, whether it be Google or not. That said, I don't know how much is being taught about libraries and library organization these days, so maybe that's a meaningless thing to say.
EricHow to detect Internet Explorer (as opposed to Firefox)
In response to feedback worried that I'm targeting any non-Firefox browser (I'm not), I've renamed and updated the browser detection page:
How to Detect Internet Explorer
It's amusing to see that the text ads served for the page are now all security-focused...
EricJust curious, but are you intentionally targeting Mozilla users to convert them to Firefox
No. If you read the How to detect Firefox page, you'll see that I kept the test very simple: just look for the string "Firefox/" in the User-Agent header. Trying to distinguish Mozilla from Firefox takes more work, but it can be done. Perhaps for most people it's more a matter of targeting Internet Explorer users by testing for the "MSIE" string.
EricGoogle AdSense Tips
What if the User-Agent header is missing? Does PHP return an empty string for $_SERVER['HTTP_USER_AGENT'] or will it cause an exception?
EricUgh. It's one thing to put up some text and some links promoting Firefox, it's another to annoy the user with popups and other spam-like tactics. What about users who are forced (by corporate policy, for example) to use IE? Be gentle, please...
EricWilliam Shatner and All-Bran
You're quite right about the Vary header, and I've updated the page (and the header viewer) accordingly, thanks: How to detect Firefox.
EricReading C Declarations: A Guide for the Mystified
to detect non-Firefox browsers (or at least non-IE)
In case it's not obvious, I meant to say or at least IE.
EricJavaScript is not Java
Now be sure to change your web pages to detect non-Firefox browsers (or at least non-IE) and encourage them to upgrade to Firefox. I've documented the basic technique here: How to detect Firefox and See the headers you're sending.
EricWhy the Vioxx recall reduced spam (humor)
1. What do you think about Shatner's new album, Has Been?
2. Do you think Shatner is a spokesman for All-Bran
EricJavaScript is not Java
Google doesn't index as thoroughly or as often as Yahoo, a search engine that's trying very hard to increase their search capabilities and that includes image searching.
Hmmm... funny, but I have the opposite experience. Google is constantly indexing my site, but Yahoo! just slurps a few. I think the session IDs in the URLs throw it off, so I should fix that. On the other hand, MSN's bot has been pretty good about indexing the site, though not as regularly as Google's. And lately there's been some other bots showing up that I've never seen before -- more proof that the search market is heating up.
There's still lots of room for improvement. You still have to wade through a lot of crap to find what you're looking for if you don't know the exact search terms to use. That's the skill you need to hone for effective searching. Think like a librarian!
EricDeploying Java Applets (old, but still useful)
Google's servers are probably too busy indexing text to spend much time on images. A couple of weeks ago I started setting up a Vioxx information site and I submitted my URL to Google for indexing, not expecting the pages to show up in the index for quite a while. The GoogleBot made its first appearance one day after my site went live, and it showed up in the index just a couple of days after that. I bet they're just not devoting horsepower to it trying to keep up with the normal text stuff.
EricHow to detect Firefox
You think that's bad, try keeping everything straight with J2ME (Java 2 Micro Edition) programming. I put the most common ones up on my J2ME section:
J2ME acronym list
EricBlackBerry programming stuff
If you're an application developer, first consider writing the apps using J2ME. This isn't always possible, I know, but if you can do it in Java then you'll have compatibility across a wide variety of Nokia (and other) devices.
EricJ2ME info here: Eric's J2ME Pages
8) Don't dig into the ground ....
9) Step carefully after it rains
10) Stay away from bait shops
11)
12) Profit!
(Sorry, couldn't resist...)
EricWhy the Vioxx recall reduced spam (humor)
Oh great, my dogs already like to grab the popcorn flying out of the air popper... now they'll be hot to trot for my data as well!
Still, I suppose this will be better than finding bits of shiny plastic in the doggie doo-doo:
Five-year old daughter: Papa, have you seen my toy?
EricMe, staring at colorful deposit in backyard: Umm, no...
Why the Vioxx recall reduces spam (humor)
Speaking of Amazon, yesterday they unveiled their new Simple Queuing Service, their latest foray into web services. They're exposing some of their infrastructure in order to let you share data between distributed components. Free for the time being, though limited in terms of how much data you can queue at once.
Of course, someone will probably sue them over this, too.
EricWilliam Shatner boldly goes like no man has gone before
Given the success of the Spread Firefox campaign, is there an equivalent campaign for Thunderbird in the works?
EricWhy the Vioxx recall reduced spam (humor)
No. Consumer DVD burners cannot burn the CSS key data required by the studios.
Yeah, but I wish you could. Or somebody could. I mean, when you go to BlockBuster and they're out of the movie you want, why can't they just burn a new copy on the spot for you to rent?
EricHow to Detect Firefox
Probably because hard disks capacities are so large and DVD burners are now pretty much standard equipment on PCs. There must be a corresponding increase in movie pirating, critical mass must have been reached.
Me, I wish they had a "burn on demand" (BOD) model where you pay a minimal fee (think rental cost, ideally cheaper) and get to burn a movie on DVD. No case, no extras, just the movie.... I guess video-on-demand is almost the same...
Speaking of lawyers: Vioxx is Prozac for lawyersMan, did you ever date yourself with that one.
You want to feel old, talk to some current university students. My wife is a part-time marketing lecturer who finds that the students rely on the Web to find information. (Because, as you know, if it's on the Web then it must be true.) They simply have no concept of a world without the Internet, which makes dealing with pre-Internet business cases very interesting for them as they try to figure out where to get the information they need.
EricHow to masquerade your browser (short answer: Get Firefox!
I remember watching someone try voice input on their TRS-80 (this is going waaayyy back)... "One... no... one... no! One! NO! ONE! AAARGGGH!" I know things have improved since then, but I'm not sure how accurate you could make it on such a small computing device. And the problem with voice input on a phone is that it's too easy to throw the phone against a wall after it misinterprets you for the hundredth time :-)
EricHow to Detect Firefox
Hmm.... this wasn't the spam reduction I had in mind when I wrote about the Vioxx recall!
EricHow to detect Firefox
There's little incentive to withhold information, really, because I doubt there are any real "trade secrets" to worry about. Many tech books are written more as a way of increasing the author's (or his company's) profile in the field. If you're a consultant, it's another way to get leads and to impress potential clients. You don't do it for the money, trust me...
EricWhy I hate Bell Mobility
... didn't Keanu Reeves originate the form?
EricWhy Vioxx is like Prozac for lawyers