If we added an extension to the Robots.txt file that said look here for the content, then published search-relevant static content derived from the dynamic content to those pages so that any visiting spider can use this (otherwise inaccessible) static content to assess the site this might work.
This would work perfectly if the web was still Academic in nature. However, now that Business has got its grubby little hands on the web, it is no longer feasible. The first thing that would happen is Porn sites would derive static content that was of interest to regular searchers but which did not in any way reflect their actual content, in the same manner that META tags are abused today. Similar misrepresentation of every sort is endemic on the web. I deal with it daily at work.
The only reliable methodology is to actually index what is visible on the web. If this means we have to figure out how to do dynamic queries when spidering then so be it. Implementing this is another matter:)
One mistake I notice a lot here on/. is people thinking that when they do a search query, the search engine actually looks at websites. This is an elementary, but important mistake. When you do a query using a search engine, all you are searching is their index of that portion of the web that they have spidered previously. The more recent the spidering of a site, the more recent its results. Actually searching the web as a result of a query would be generally way too time consuming to satisfy most users. Do you want to make a query and get the results back hours later?
This makes perfect sense, Red Hat seems to be positioning itself to be a major player in the hightech industry, not just the 'Alternative OS' market. Rumour has it they may be getting ready to purchase Corel as well. With Mozilla to provide a browser platform, and Corel Wordperfect Office suite to provide an office environment, they would be well placed to deal competitively with Microsoft (although they will have a long uphill climb in that battle). Looks like they have decided to leverage all that money they collected from their IPO to place themselves in direct competition to the big boys.
I would expect to see them purchase a few more companies to round out there holdings and widen their market.
If we added an extension to the Robots.txt file that said look here for the content, then published search-relevant static content derived from the dynamic content to those pages so that any visiting spider can use this (otherwise inaccessible) static content to assess the site this might work.
This would work perfectly if the web was still Academic in nature. However, now that Business has got its grubby little hands on the web, it is no longer feasible. The first thing that would happen is Porn sites would derive static content that was of interest to regular searchers but which did not in any way reflect their actual content, in the same manner that META tags are abused today. Similar misrepresentation of every sort is endemic on the web. I deal with it daily at work.
The only reliable methodology is to actually index what is visible on the web. If this means we have to figure out how to do dynamic queries when spidering then so be it. Implementing this is another matter :)
One mistake I notice a lot here on /. is people thinking that when they do a search query, the search engine actually looks at websites. This is an elementary, but important mistake. When you do a query using a search engine, all you are searching is their index of that portion of the web that they have spidered previously. The more recent the spidering of a site, the more recent its results. Actually searching the web as a result of a query would be generally way too time consuming to satisfy most users. Do you want to make a query and get the results back hours later?
:)
Just my thoughts...
(Note: my views are not necessarily those of my employer)
Phrogman
Cybrarian@maplesquare.com
This makes perfect sense, Red Hat seems to be positioning itself to be a major player in the hightech industry, not just the 'Alternative OS' market. Rumour has it they may be getting ready to purchase Corel as well. With Mozilla to provide a browser platform, and Corel Wordperfect Office suite to provide an office environment, they would be well placed to deal competitively with Microsoft (although they will have a long uphill climb in that battle). Looks like they have decided to leverage all that money they collected from their IPO to place themselves in direct competition to the big boys.
I would expect to see them purchase a few more companies to round out there holdings and widen their market.