Indexing Dynamic Sites For Search Engines?
Moeses asks: "I am working on a Web site that uses the Altavista search engine software. The latest version of the site has moved most of the data from static pages to dynamic pages. This causes some issues to arise, but I've developed work arounds for most of them, such as generating pages with URL's that contain all the query string information to index the whole database and code to handle situations where a user searches for something that can't be displayed because of some state information specific to that users session, but there are still enough issues that I can't index all the states of the files that I need. Building a custom search engine for the database isn't within the budget of this project. What are you others doing to index and search your dynamic sites?"
i have this problem as well. so i did the halfway solution. i wrote a simple script that iterates through the dynamic pages outputting them as static html files. these files are submitted to the search engines.
the problem is of course the pages end up getting old. no problem, add a little "this is an archived version of this page, please click here for the newest version" message. rerun the script when necessary.
i did this and was able to submit all my dynamic pages to altavista. what i also did was add an additional little "prev | next" link at the bottom, so a spider could start at one page and follow links to the end. i went further and created a hallway page to submit to altavista.
also, the pages are flat so they tend to load faster than dynamic ones.
check out the page i submitted to AV, and old archived page (contains the links prev|next links @ bottom, or the live homepage
NEWS: cloning, genome, privacy, surveillance, and more!
NEWS: cloning, genome, privacy, surveillance, and more!
use mod_rewrite to make your dynamic pages look like static html.
.*_id(.*)\.html$ news.php?id=$1
.htaccess.
... ;) but this technique doesn't seem to work with all searchengines.
....)
An example:
you have a script called news.php and an news index id (news.php?id=42 i.e.).
You could map that to
news_id42.html with
RewriteEngine on
RewriteRule
in your
Voila ! your dynamic content looks exactly like a static html page.
Anoter one is to fool searchengines that the script is an directory:
foobar.php/param1/param2/
Works perfectly fine
(don't remember which
regards,
Michael
Samba Information HQ