Independent Data and Formatting with Microformats
IdaAshley writes to tell us IBM DeveloperWorks is running an article about how to best utilize microformats to embed data within standard XHTML code. From the article: "Microformats are a pragmatic approach to solving the issue of structured data on the Web. Is it as architecturally pure as XML-encoded data separated from its formatting through a mechanism such as XSLT style sheets? No. But I think this approach is a realistic middle step that will help build a more intelligent Web that is easier to use and provides better search and data integration."
Some of us have been doing this for YEARS. At least now we have a buzzword for it.
--I'm so big, my sig has its own sig.
-- See?
Why shouldn't they have a in-house browser? they do have Lotus Notes.
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
Get off your hobby-horse, Jorn. At some point, please realise that you are clueless about markup. Only then will you be able to learn a bit about what you are so high-and-mighty about.
Firstly, <meta> is an element type, not a header. It doesn't do your credibility much good when you don't even know what it is.
Secondly, <meta> is an astonishingly limited element type. It's scoped to the page not particular parts of it, and it has a plain-text content model because it uses attributes instead of child elements.
Thirdly, I anticipate you saying that you could fix this by changing the <meta> element type. Sure you could. You could fix it by changing it to a set of element types that describe content more accurately and changing it so that it could appear in other parts of the document. And you know what you'd have then? The structured HTML that you despise so much. That's right, microformats embody the very thing you are criticising.
Finally, given that HTML hasn't changed recently to allow microformats, everything that is possible today with microformats was possible five years ago with microformats. It's a design strategy, not a new technology.
Again, please learn a bit about something before you turn your nose up at it. You might be smart in other respects, but when it comes to markup, you are dumb. Please accept this so you can change it.
I'm sure the LISP community would love to hear about this brand-new idea of embedding specialy, or domain-specific if you will, languages and data. How extraordinarilly novel.
You'll be running a limited LISP implementation on every browser in no time!
This suffers from the same thing XML did. Remember when XML was going to revolutionize communication between computers by structuring everything consistently? Then tripped over which was crawling on the floor after being decked by who was rather pissed off after an argument with Henry</name> and the whole thing went down in a pile of flames and is now relegated to being a 2MB configuration parsing library to embrace and extend "option=value".
So now why is this "vevent" class special, and who decided it would be "vevent" and not "scheduledevent" or "calendarevent" or "microsoftcalendarhassomethingforyoutodotoday"? Clearly as a human I can look at "dtstart" and think about it and realize that this means the starting date, but how does a computer know this? If the "semantic web" is going to take off, then we need semantics, and pronto.
Hopefully any standardization doesn't turn into a nightmare though. I used to develop in the healthcare insurance claims field, and the old NSF format for transmitting an insurance claim electronically was a horrible death-by-committee piece of work. It was as if nobody could come to a consensus and the committee decided to just throw everything in. You might look at your insurance card and think "gee I have an insurance ID number" but no, in the NSF, there were about 10 different blanks for insurance IDs, depending. Is it a Medicare number? Then it goes in the Medicare blank. God forbid the computer would have just one blank and assume that if you're billing Medicare then the number in the blank is probably a Medicare ID. Medicare was easy, there's just one. Medicaid in most states have a billion subcontractors, all with names that have nothing to do with "medicaid" so you simply had to maintain a magic list of insurance plans that changed every other year or so that used the Medicaid ID field. Or the separate fields for Blue Cross and Blue Shield. What about the states where you have BCBS as a single entity?
Anyway, I'm digressing (and ranting about a chunk of my ilfe I'd much rather forget). What's important in standardizing in semantics is identifying everywhere where things are identical and reusing semantics whenever possible. Decisions have to be made up front as to what is the relationship between "name" and "last name" (people have a name, which has a last name, yet companies have names that typically don't have a last name. What about a cat named "John K. Wibblesworth" how is that different from one named "Tama"?) Yet, take dtstart which is used here for a calendar event. Should we have "dtclassstart" for the first day of school?
The difference between this and text tagging is that this has a set structure.
"Love is like a trampoline, first it's like "SWEET!!" then it's like *BLAMM!*"
Ok, so this "microformats" thing is about encoding extra data inside an HTML file by abusing CSS class names for markup, isn't that completly unnecessary and nothing more than an ugly hack? Don't we have XML namespaces for exactly that reason? Wouldn't something like:
<span style="display: none">
<vevent:event>
<vevent:dtstart>20060501</vevent:dstart>
<vevent:dtend>20060502<vevent:dtend>
<vevent:summary">My Conference opening</vevent:summary>
<vevent:location>Hollywood, CA</vevent:location>
</vevent:event>
</span>
We the 'right'[tm] way to day it?
If the "semantic web" is going to take off, then we need semantics, and pronto.
as:
If the "semantic web" is going to take off, then we need semantics, and porno.
That is all.
The article mentions the wiki, but doesn't link to it, except at the very bottom of the resources section.
None of it. META tags and microformats serve two entirely seperate purposes, and neither is in any way a replacement for the other.
Very little. For instance -- if I had a full page calendar display -- because META is scoped to the whole page, I couldn't include an event record for each individual event -- I'd have to have the person go to a 'more information' link, and then give the event information. If I wanted them to do that, I could've just given them an iCal file. This allows the semantic marking to be along side the format to be presented to the user. (as we would assume that the person wouldn't want to pull down all events from the calendar -- think something like registering for classes in college, where you might only want one or a few from the full list of events)
And many times, even when there is a single event mentioned within a document, it would not be semantically correct to say that the event applies to the entire page -- it may only be a section of the page that is relevent to the event. (eg, the front page of a website, with info about a company, and then an upcoming event announcement)
I personally didn't like the examples given in the IBM article. Some of the past examples that I've seen include embedding semantic detail within a paragraph of text (eg, a movie review), so that different review formats could then be processed in an automated way.
Build it, and they will come^Hplain.
Mixing presentation and data - good... bad... good. But it gets better a little, each time (maybe more of a spiral than a wheel).
We're using them on aim pages for module development (I cover it a bit here). Its a nice simple standard, and the idea needed SOME name - don't make more of it than it its.
-----
graphically speaking
graphically speaking
This is a kind of neat idea, except, of course, if I have CSS that does something with, oh, say, a class of "dtstart". Sure, it's easy to recognise that ".vevent > .url > .dtstart" is a microformat data item for an hCalendar, but if I'm already using "dtstart" or "url" regularly in my markup so I can apply styles to those kinds of things, I'm pretty much SOL. Rewrite all your markup and CSS to stop using those names.
There's no namespacing. There's not even an ATTEMPT at namespacing. This will fast become an unmanageable hodge-podge of insanity, with common words used willy-nilly in class attributes.
The class attribute is defined as CDATA. That's it. You can use pretty much ANY character in it. There's a lot of characters that can't be used in a CSS selector, though, such as ":". See where I'm going with this? <div class="mf:vevent"> for a start. Better yet, <div class="hidden mf:vevent"> such that you can hide (or format) the block of data separately.
Now, as if that wasn't bad enough, and, trust me, it IS bad enough, there's also the misuse of the "title" attribute and the "abbr" element. A machine formatted date is not the expanded version of a human formatted date, which is not an abbreviation. A renderer trying to make sense of <abbr class="dtstart" title="10034134134T00">17th Smarch</abbr> will think "AHA! This here is an abbreviation, I will provide unto the user some means to see what that '17th Smarch' abbrevation stands for!" Usability disasters follow.
So, in summary, this is the worst idea I've seen in HTML space since some bright spark said, "let's suggest that people use the 'text/html' content type for their XHTML markup!"
I do like the idea of being able to move XML around without having to parse to view the basic file in a formatted fashion. So, you're mixing HTML with a tag. Again, SO WHAT? But what about the encapsulated text, what's the point?
To make things application parsable. Try reading the article before complaining that you don't see the point.
If you're going to use a viewer eventually
If you'd bother to read the article, which is about comparing one application parsable format (iCal) to the new microformat, you'd understand that the web is moving towards human-readable things being software-readable too.
(because you have the encapsulated text)
That's like referring to a car as a pile of steel and glass: it completely ignores the purpose of something in favor of describing its construction. You might as well refer to a database as a large string of bytes, then complain that it's not solely focussed on human readability either.
use a viewer
Most of us would like to be able to use more than a web browser, by now. Try stepping out of the early 90s. The air's better up here.
This would only help in reading the actual data
Or machine parsing.
but not in bug fixing
Well, that isn't the point at all, so oh well. 'Course, since it's machine parsable, it actually would be quite a bit easier to find markup errors (which aren't the same as bugs.) So even though that's not the point, you're still wrong.
because the XML is that much more unreadable.
Er, XHTML is an XML dialect. The difference between XML and XHTML/HTML, unless you're dealing with XSL or XPath, is negligable. Thanks for pretending to know things you don't, though; it always makes for entertaining reading.
Moderators: informative means "gives us new information we didn't previously have." The moderation you were looking for was insightful, except of course that parent isn't that either.
StoneCypher is Full of BS
And I think that muddling data and presentation without explicit distinction is exactly what was wrong with HTML. Which we just spent a decade slightly recovering from. I guess IBM has made a lot of money on crappy tools, good tools to extract data from crappy data, and extra money for doing it right.
--
make install -not war
I was going to say "I Don't Get It" but somebody beat me to it.
I think the title of TFA "Separate data and formatting with microformats" is a bit ironic since it's about wedging your data into a web page in such a fashion that somebody might be able to pull it back out.
If you want to make your data available there are all sorts of standard and more efficient ways of doing it than embedding it in the presentation layer. If somebody is going to all the trouble to create a parseable human-readable page, why wouldn't they go to about the same amount of trouble and make a far more efficient and standard RSS feed? What about the buzzword of the last few years, SOAP? Hell, what about XML?
From TFA:
I agree. This reminds me of the lame number tricks where you have somebody pick a number, add something, multiply it by something, blah blah blah, you take the result, divide it by 7 and then you give them their orignal number because you had it all set up ahead of time. If they screw up in their calculations, the trick doesn't work. In this thing, if you screw up embedding the text within the HTML (plenty of ways to do that), the trick doesn't work - and doesn't accomplish much even if it does.Look into JSON..its basically javascript data structures that you eval on the client. Why bother assembling thick XML that needs to be parsed on the client. XML is slow, and even slower if you have to XSLT it out of the XHTML.