Greatest Task of Web 2.x: Meta-Validation
CexpTretical writes "This Technology Review article about Web 2.x problems fails to mention the 800 pound gorilla in the room when it comes to fulfilling the dreams of the Semantic Web — i.e., assumptions about the validity of metadata or tagging schemes. We can add all of the metadata and/or tags we want to web resources but that does not mean that the 'data about the data' honestly or accurately describe the resource or are 'about the data' at all. This is why Google does not place much importance on the metadata already contained in HTML document headers for search ranking, because it cannot be trusted. And to validate it would require more effort than to search and index that data from scratch. Ensuring or verifying the validity of metadata would be a task equal to that of initially creating it, but would have to be repeated on an ongoing basis. Hence all of the talk about 'trusted networks,' which then require trusting the gatekeepers of those networks. Talk about 'semantics.'" Slashdot's moderation and meta-moderation offer one example of getting useful metadata in a non-trusted environment.
You will be fine if at lowest, core level you store your document data as XHTML -- or better, XML w/ an appropriate, fine-tuned schema.
...but you *must* start by having the data stored in a micro-addressable format -- i.e:"XML."
XSL makes it *trivial* to translate that data into HTML. (or leave it as XHTML, if the browser, or client device supports it.)
This translation can be pre-rendered, or done in real-time as a page or document is rendered and served. (yes, XSL is fast enough.)
Done right, you can future-proof the data that underlies your pages, documents, and user interfaces. You can share the same data between pdfs, Flash interactives, and web pages, necessary. You could even translate that data into other XML -- say, if you improve or extend your schema.
Make that data your bi*tch, and it'll do what you want.