Independent Data and Formatting with Microformats

Geez, man... by Chysn · 2006-07-11 11:26 · Score: 3, Insightful

Some of us have been doing this for YEARS. At least now we have a buzzword for it.

--
--I'm so big, my sig has its own sig.
-- See?

Re:Geez, man... by ManoSinistra · 2006-07-11 11:30 · Score: 0

I appreciate the good tutorial. Hopefully this will help to make it a more common practice and speed up the development of the web. Way to go XHTML.
Re:Geez, man... by ChaoticChowder · 2006-07-11 12:19 · Score: 2, Informative

I just wrote a Java program to do all that in one step last week. I even took it a step further and used the Sun classes for parsing HTML and Xerces for XHTML. Anyone who has ever had to do a datamining project knows how to do this. I don't really think this is a big deal at all. Just another excuse to apply a Web 2.0 buzzword to a technique that's been around for quite a while. Tutorials on the web these days are getting to be pretty lame. Maybe I'll write a couple myself, at least I have the chance of being recognized on /.
Re:Geez, man... by frisket · 2006-07-12 09:10 · Score: 3, Insightful

Some of us have been doing this for YEARS. At least now we have a buzzword for it.
There is already a buzzword: tag abuse. It's the last resort of the untalented.
This particular version is known as semantic imputation (giving things meanings they don't inherently have). It's neither new, special, exciting, nor useful, but at least we now know how little the people at IBM and Leverage Software know about markup and XML.
I guess I'd better add a warning to the XML FAQ about it...

META headers by RobotWisdom · 2006-07-11 11:26 · Score: 1, Interesting

How much of this could have been done 5 years ago if the structured-HTML community hadn't blindly rejected META headers?

Re:META headers by Anonymous Coward · 2006-07-11 11:51 · Score: 4, Informative

Get off your hobby-horse, Jorn. At some point, please realise that you are clueless about markup. Only then will you be able to learn a bit about what you are so high-and-mighty about.

Firstly, <meta> is an element type, not a header. It doesn't do your credibility much good when you don't even know what it is.

Secondly, <meta> is an astonishingly limited element type. It's scoped to the page not particular parts of it, and it has a plain-text content model because it uses attributes instead of child elements.

Thirdly, I anticipate you saying that you could fix this by changing the <meta> element type. Sure you could. You could fix it by changing it to a set of element types that describe content more accurately and changing it so that it could appear in other parts of the document. And you know what you'd have then? The structured HTML that you despise so much. That's right, microformats embody the very thing you are criticising.

Finally, given that HTML hasn't changed recently to allow microformats, everything that is possible today with microformats was possible five years ago with microformats. It's a design strategy, not a new technology.

Again, please learn a bit about something before you turn your nose up at it. You might be smart in other respects, but when it comes to markup, you are dumb. Please accept this so you can change it.
Re:META headers by Anonymous Coward · 2006-07-11 12:36 · Score: 1, Funny

Looks like someone has been trolled up the ass badly.
Re:META headers by RobotWisdom · 2006-07-11 13:20 · Score: 1

Get off your hobby-horse, Jorn
Strangely enough, I just asked a question: "How much of this could have been done 5 years ago if the structured-HTML community hadn't blindly rejected META headers?"
So if anyone is on a hobbyhorse, it's Mr Coward.
It's scoped to the page not particular parts of it, and it has a plain-text content model because it uses attributes instead of child elements.
So I guess I have to ask again: How much of microformats could have been done using META, given that it's scoped to the page (which is no problem for the most important page semantics), and uses attributes?
Re:META headers by Anonymous Coward · 2006-07-11 13:25 · Score: 0

Ok, so GP made a mistake. Don't we all?
Just present the correct facts and be done with it. No need to be rude.
Re:META headers by Karma+Farmer · 2006-07-11 13:35 · Score: 4, Informative

How much of this could have been done 5 years ago
All of it. Microformats use features introduced with HTML 4.0 in 1997, so all of this was possible nearly 10 years ago.

How much of microformats could have been done using META
None of it. META tags and microformats serve two entirely seperate purposes, and neither is in any way a replacement for the other.
Re:META headers by oneiros27 · 2006-07-11 13:46 · Score: 2, Informative

So I guess I have to ask again: How much of microformats could have been done using META, given that it's scoped to the page (which is no problem for the most important page semantics), and uses attributes?

Very little. For instance -- if I had a full page calendar display -- because META is scoped to the whole page, I couldn't include an event record for each individual event -- I'd have to have the person go to a 'more information' link, and then give the event information. If I wanted them to do that, I could've just given them an iCal file. This allows the semantic marking to be along side the format to be presented to the user. (as we would assume that the person wouldn't want to pull down all events from the calendar -- think something like registering for classes in college, where you might only want one or a few from the full list of events)

And many times, even when there is a single event mentioned within a document, it would not be semantically correct to say that the event applies to the entire page -- it may only be a section of the page that is relevent to the event. (eg, the front page of a website, with info about a company, and then an upcoming event announcement)

I personally didn't like the examples given in the IBM article. Some of the past examples that I've seen include embedding semantic detail within a paragraph of text (eg, a movie review), so that different review formats could then be processed in an automated way.

--
Build it, and they will come^Hplain.
Re:META headers by RobotWisdom · 2006-07-11 13:50 · Score: 1

How much of microformats could have been done using META... None of it.
I just don't believe that. If you're describing one or more events, why can't you put most or all of those descriptions in META format?
For me, the worstcase for the waste in rejecting META is that we could have been putting Yahoo/Dmoz categories there, all this time, but haven't been because the 'cult' didn't think it was fancy enough.
Re:META headers by Anonymous Coward · 2006-07-11 13:54 · Score: 0

If you're describing one or more events, why can't you put most or all of those descriptions in META format?
Because meta elements describe the whole document, not individual elements.
Re:META headers by RobotWisdom · 2006-07-11 14:13 · Score: 1

I couldn't include an event record for each individual event
I'm not convinced you've really tried-- suppose the METAs described an event1, an event2, etc, and whatever Firefox extension is tasked with extracting that info could look for flags in the body that show which event is described where?
Microformats seem to be a classic 'bag taped to the side', because the logic of the semantic web was still poorly visualised when XML was selected. I'm just asking whether META doesn't deserve rehabilitation as a 'bag at the top' instead...
Re:META headers by Anonymous Coward · 2006-07-11 14:23 · Score: 0

Get off your hobby-horse, Jorn

Strangely enough, I just asked a question

You asked a question with inherent criticism, ignorance and arrogance. And you do this every single time I see you say something about markup. You seem to take one incredibly superficial glance at a topic, decide on what you think is right, and call everybody else stupid for not agreeing, even when the approach you think is right doesn't even make sense.

Your idea that <meta> can take the place of microformats is just another example of this. There was no great conspiracy in the community to get rid of <meta>, it's just no good for this type of thing. And if you'd care to actually learn a little about what you disparage, you'd actually realise this.

Karma Farmer adequately covered the rest. It's getting on a decade that you've been able to do this with HTML. Take your head out of the sand. Your arrogance is causing your ignorance.
Re:META headers by stonecypher · 2006-07-11 14:28 · Score: 1

It was all done 20 years before the web existed, as SGML. But thanks for playing.

--
StoneCypher is Full of BS
Re:META headers by Anonymous Coward · 2006-07-11 14:43 · Score: 0

I'm not convinced you've really tried-- suppose the METAs described an event1, an event2, etc, and whatever Firefox extension is tasked with extracting that info could look for flags in the body that show which event is described where?

Good idea. Of course, there would need to be some way of marking the flags so that the Firefox extension can find them. It's easiest to use special characters to minimise the conflict with real text, but the characters would still have to be easily typable.

Oh, I know, why don't we use labels that are delimited with < and >?
Re:META headers by RobotWisdom · 2006-07-11 15:05 · Score: 1

I personally didn't like the examples given in the IBM article
Is that
<abbr class="dtstart" title="20060501">May 1</abbr> -
<abbr class="dtend" title="20060502">02, 2006</abbr>
crap really the best they've got? It makes my eyes bleed...
(Since the association between the human- and machine- readable texts is wholly imaginary, why not keep the machine vesion in META?)
Re:META headers by Baricom · 2006-07-11 16:19 · Score: 1

In IBM's defense, it's not a format they've made up. hCalendar's primary author is from Technorati.
Re:META headers by Hynee · 2006-07-11 16:50 · Score: 1

People like using the HTML the way they do. Use newsgroups if it suits you better. Simple as that.

--
Damn, I already moderated this topic. Now I'll have to log in with my sock puppet to comment.

Firefox by Sir_Lewk · 2006-07-11 11:35 · Score: 1

I didn't know IBM used Firefox, I'd have figured that they had their own, "in-house" broswer. Neato

--
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)

Re:Firefox by Sir_Lewk · 2006-07-11 11:47 · Score: 2, Funny

Why shouldn't they have a in-house browser? they do have Lotus Notes.

--
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
Re:Firefox by Drooling+Iguana · 2006-07-11 11:55 · Score: 1

WebExplorer FTW!

--
... I'm addicted to placebos
Re:Firefox by Anonymous Coward · 2006-07-11 12:08 · Score: 0

I'l take that as a "not too good."
Re:Firefox by siegesama · 2006-07-11 12:08 · Score: 1

Neither Lotus Notes nor Lotus Sametime (which I'm expecting someone else to mention any moment now) are really "in-house". They're both applications produced by companies purchased by IBM, which are still marketable products. We're just "eating our own dogfood"

--
what the hell is a 'junk character', anyway?
Re:Firefox by FooAtWFU · 2006-07-11 13:48 · Score: 1

They sort of have an "in-house" edition of Firefox that you can install through the IBM Standard Software Installer (or which you might get preinstalled on your ThinkPad). It's effectively the exact same thing, has a few extra search engines maybe (for searching the intranet, the internal "blue pages" directory, et cetera) and a little string in the window titlebar... maybe a few icons are different here and there...

--
The World Wide Web is dying. Soon, we shall have only the Internet.

Tagging in Text by inKubus · 2006-07-11 11:36 · Score: 1, Informative

This is just tagging in text; it's exactly what you do for CSS: You're saying this text is of a certain class. And you contain it in a box. All this is doing is using the same stuff and storing a little variable name and using it later. One might argue you are already doing that with CSS, it's just formatting stuff you're attaching to the variable rather than, ah, data structure..

I do like the idea of being able to move XML around without having to parse to view the basic file in a formatted fashion. So, you're mixing HTML with a tag. Again, SO WHAT? But what about the encapsulated text, what's the point? If you're going to use a viewer eventually (because you have the encapsulated text), use a viewer. This would only help in reading the actual data, but not in bug fixing, because the XML is that much more unreadable.

On the other hand, this is kindof like the PDF format, with text as text. The PDF client renders it as a font bitmap but it's rendered from TEXT in the PDF, therefore you can do things like cut/paste/etc. This takes it a step further by adding a data structure around it which allows you to import rows of things. Pretty sweet, I might use this somewhere. I can see it being useful in mobile stuff, so you don't have to muck with a client parser.

--
Cool! Amazing Toys.

Re:Tagging in Text by cdcarter · 2006-07-11 12:22 · Score: 2, Insightful

The difference between this and text tagging is that this has a set structure.

--
"Love is like a trampoline, first it's like "SWEET!!" then it's like *BLAMM!*"
Re:Tagging in Text by Mr_Tulip · 2006-07-11 13:05 · Score: 4, Informative

The thing that makes Microformats stand out from homebrew versions is the attempt to standardize the formats, allowing others to easily work out what microformat you are using and integrate them into their own site.
The article mentions the wiki, but doesn't link to it, except at the very bottom of the resources section.
Re:Tagging in Text by stonecypher · 2006-07-11 14:38 · Score: 2, Insightful

I do like the idea of being able to move XML around without having to parse to view the basic file in a formatted fashion. So, you're mixing HTML with a tag. Again, SO WHAT? But what about the encapsulated text, what's the point?

To make things application parsable. Try reading the article before complaining that you don't see the point.

If you're going to use a viewer eventually

If you'd bother to read the article, which is about comparing one application parsable format (iCal) to the new microformat, you'd understand that the web is moving towards human-readable things being software-readable too.

(because you have the encapsulated text)

That's like referring to a car as a pile of steel and glass: it completely ignores the purpose of something in favor of describing its construction. You might as well refer to a database as a large string of bytes, then complain that it's not solely focussed on human readability either.

use a viewer

Most of us would like to be able to use more than a web browser, by now. Try stepping out of the early 90s. The air's better up here.

This would only help in reading the actual data

Or machine parsing.

but not in bug fixing

Well, that isn't the point at all, so oh well. 'Course, since it's machine parsable, it actually would be quite a bit easier to find markup errors (which aren't the same as bugs.) So even though that's not the point, you're still wrong.

because the XML is that much more unreadable.

Er, XHTML is an XML dialect. The difference between XML and XHTML/HTML, unless you're dealing with XSL or XPath, is negligable. Thanks for pretending to know things you don't, though; it always makes for entertaining reading.

Moderators: informative means "gives us new information we didn't previously have." The moderation you were looking for was insightful, except of course that parent isn't that either.

--
StoneCypher is Full of BS
Re:Tagging in Text by inKubus · 2006-07-11 20:01 · Score: 1

I was like, "I do this every day", so what? I see from the Wiki that it's like RSS, and they are trying to standardize the formats. Thanks. Everything is getting closer to being truely useful every day.

--
Cool! Amazing Toys.

LISP by Anonymous Coward · 2006-07-11 11:56 · Score: 5, Insightful

I'm sure the LISP community would love to hear about this brand-new idea of embedding specialy, or domain-specific if you will, languages and data. How extraordinarilly novel.

You'll be running a limited LISP implementation on every browser in no time!

Re:LISP by rblum · 2006-07-11 12:49 · Score: 5, Funny

I wish the LISP community would finally stop whining and realize they're doing nothing we old farts haven't done in Turing machines!
Re:LISP by The_Wilschon · 2006-07-11 13:29 · Score: 2, Interesting

http://en.wikipedia.org/wiki/Turing_tarpit

--
SIGSEGV caught, terminating

wait... not that kind of sig.
Re:LISP by stonecypher · 2006-07-11 14:40 · Score: 0, Troll

I love it when the LISP community pretends they invented things they didn't, and that it's going to lead to LISP being in places it'll never be.

It's even better when they can't spell simple words like extraordinarily.

--
StoneCypher is Full of BS
Re:LISP by fm6 · 2006-07-11 15:15 · Score: 2, Interesting

I know an old LISP hacker who simply doesn't understand all the fuss over XML. To him XML documents are just S-Expressions, only klunkier!
Re:LISP by hey! · 2006-07-12 02:21 · Score: 1

You had Turing Machines? Sonny, you kids don't know how good you had it. Back in my day we had to quarry granite blocks, drag them hundreds of miles, then fuss with them, just so we'd know it our barley crop was safe from frost.

And did we get appreciation for all that work? Hah. Some people must think slaves grow on trees. They don't. They just end up on 'em.

Or maybe in flaming wicker baskets.

Um, what were we talking about?

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:LISP by thePowerOfGrayskull · 2006-07-12 02:49 · Score: 1

I wish the old fart community would stop whining and realize they've done nothing we haven't done in... uh...

Standardization is the problem by Anonymous Coward · 2006-07-11 12:02 · Score: 5, Insightful

This suffers from the same thing XML did. Remember when XML was going to revolutionize communication between computers by structuring everything consistently? Then tripped over which was crawling on the floor after being decked by who was rather pissed off after an argument with Henry&lt/name> and the whole thing went down in a pile of flames and is now relegated to being a 2MB configuration parsing library to embrace and extend "option=value".

So now why is this "vevent" class special, and who decided it would be "vevent" and not "scheduledevent" or "calendarevent" or "microsoftcalendarhassomethingforyoutodotoday"? Clearly as a human I can look at "dtstart" and think about it and realize that this means the starting date, but how does a computer know this? If the "semantic web" is going to take off, then we need semantics, and pronto.

Hopefully any standardization doesn't turn into a nightmare though. I used to develop in the healthcare insurance claims field, and the old NSF format for transmitting an insurance claim electronically was a horrible death-by-committee piece of work. It was as if nobody could come to a consensus and the committee decided to just throw everything in. You might look at your insurance card and think "gee I have an insurance ID number" but no, in the NSF, there were about 10 different blanks for insurance IDs, depending. Is it a Medicare number? Then it goes in the Medicare blank. God forbid the computer would have just one blank and assume that if you're billing Medicare then the number in the blank is probably a Medicare ID. Medicare was easy, there's just one. Medicaid in most states have a billion subcontractors, all with names that have nothing to do with "medicaid" so you simply had to maintain a magic list of insurance plans that changed every other year or so that used the Medicaid ID field. Or the separate fields for Blue Cross and Blue Shield. What about the states where you have BCBS as a single entity?

Anyway, I'm digressing (and ranting about a chunk of my ilfe I'd much rather forget). What's important in standardizing in semantics is identifying everywhere where things are identical and reusing semantics whenever possible. Decisions have to be made up front as to what is the relationship between "name" and "last name" (people have a name, which has a last name, yet companies have names that typically don't have a last name. What about a cat named "John K. Wibblesworth" how is that different from one named "Tama"?) Yet, take dtstart which is used here for a calendar event. Should we have "dtclassstart" for the first day of school?

Re:Standardization is the problem by Bogtha · 2006-07-11 12:14 · Score: 4, Insightful

Remember when XML was going to revolutionize communication between computers by structuring everything consistently?

No. I do remember how a lot of clueless PHB-types ran around telling everybody that though. XML solves the parsing problem, not the semantics problem. It's languages built on top of XML that handle semantics.

XML was never meant to solve the problem you are talking about. Parsing markup into a tree is a totally different concept to figuring out what the stuff in the tree means. The only people who ever thought XML had something to do with what you say were totally clueless about XML.

So now why is this "vevent" class special, and who decided it would be "vevent" and not "scheduledevent" or "calendarevent" or "microsoftcalendarhassomethingforyoutodotoday"?

It's special because it appears in the hCalendar specification. The people who wrote the specification decided it would be "vevent". They intend to submit it to a standards body.

--
Bogtha Bogtha Bogtha
Re:Standardization is the problem by TedTschopp · 2006-07-11 12:53 · Score: 4, Informative

So now why is this "vevent" class special, and who decided it would be "vevent" and not "scheduledevent" or "calendarevent" or "microsoftcalendarhassomethingforyoutodotoday"?

The idea is to leverage standards that are already out there, and in this case it would be the iCalendar standard.

--
Fantasy remains a human right; we make in our measure and in our derivative mode... -- JRR Tolkien
Re:Standardization is the problem by KingMotley · 2006-07-11 13:30 · Score: 1

Not to get off topic, but there are many reason why NSF and ANSI 837 can support multiple ID's, like: A) Coordination of Benefits. Depending on who you are sending the claim to, if the insured has multiple insurance plans, one insurance companies pay out may differ depending on what the other insurance payouts are. In some cases a primary insurance plan may need to forward the claim to a secondary or tertiary insurance company that uses a different ID. B) Better insured matching. If you supply an insured's medicare id, employee id, social security number (Not supposed to do that, but it's rampant), EIN, etc. Then if you can't find the patient/insured via one ID, then you may be able to find them via the other IDs. This helps to reduce the claim processing time, as the claim doesn't need to be rejected. C) Claim clearinghouses. In cases where you are submitting a claim to multiple insurance companies (Primary, secondary, tertiary, etc), the insured/patient information can be contained in a single instance, and the insurance companies to receive the claim can reference the single instance with multiple ID's. Each insurance company can then pull whichever ID they use internally to identify the insured/patient.
Re: Standardization is the problem by scdeimos · 2006-07-11 13:56 · Score: 1

Decisions have to be made up front as to what is the relationship between "name" and "last name" (people have a name, which has a last name, yet companies have names that typically don't have a last name. What about a cat named "John K. Wibblesworth" how is that different from one named "Tama"?)

And how do you classify people who have just one name, like "Virgil?" I don't mean "Virgil Williams" or "Andrew Virgil," just "Virgil." Is that his first name, last name or something else altogether?

The problem with standards is that people keep making new ones.
Re: Standardization is the problem by ktdid · 2006-07-11 14:18 · Score: 1

I would imagine that the answer would be either with class="fn" or maybe class="nickname" since the hcard standard pretty follows the vcard standard: http://www.ietf.org/rfc/rfc2426.txt
Re:Standardization is the problem by Anonymous Coward · 2006-07-11 14:36 · Score: 0

I think the problem that this comment and many others demonstrates is that many people can't get away from the idea that if something doesn't "boil the ocean" (that is, solve all possible problems as completely as possible for all people at all times) then it is useless.

Microformats, rather just being "blessed things you must use from people smarter than you", are an approach to the problem of how to we take emergent behavior on the web (such as the fact lots of people put reviews, or contact information, or information about their relationships with other people on their blogs and other sites) and create usable constructs which work on today's web, with today's browsers and today's tools, wth developers' current sets of skills, to enable software to more meaningfully aggregate this information, to enable much better web based data interchange, and to generally encourage decentralized services.

People are already doing very cool things wth microformats. Big organisations like Yaghoo! Less than 5 minutes at http://microformats.org/ ought to demonstrate that there is already significant adoption and mindshare, that f!=XML, lisp etc and it's probably better to understand a little before criticizing based on a short synposis of an article, but I forgot this was slashdot.

But seriously, check it out. No wifi. Not Lame. Will change web. You read it here first.
Re:Standardization is the problem by stonecypher · 2006-07-11 14:54 · Score: 3, Insightful

This suffers from the same thing XML did. Remember when XML was going to revolutionize communication between computers by structuring everything consistently?

Yeah. It works when you use the same DTD, which was the promise. It's not XML's fault that you and your supplier can't get your ducks in a row. The purpose of XML is to provide a medium that two ends can use to standardize a communications format of their own design, while giving a regular form to said formats so that arbitrary formats could be supported by arbitrary tools. It fulfills this ideal quite well, as anyone even vaguely familiar with web standards knows. It is not meant to magically merge two inconsistent standards.

Then <lname> tripped over <lastname> which was crawling on the floor after being decked by <name last="Henry"/> who was rather pissed off after an argument with <name><last>Henry</last>&lt/name>

Yeah. And that's XML's fault how? Get a DTD and stick to it.

and the whole thing went down in a pile of flames

Yeah, essentially every office suite, database, most graphics editors, many layout programs, and quite a few games support XML. Jabber / Google Chat run on XML. The web is built on an SGML dialect, which is largely being converted into an XML dialect; XML is itself an SGML dialect. Web 2.0 (god I hate that name) is an outcropping of XML's parsability. XML is so useful that Microsoft was able to use it to ward Massachusettes' lawsuits off. The United Nations now releases their transcripts solely in XML. XML is now the second most pervasive data storage format on earth, after CSV/TSV, and it's gaining fast. (Don't bother saying SQL - it's an API, not a storage format.)

Exactly what is your definition of "going down in flames" ?

and the whole thing went down in a pile of flames and is now relegated to being a 2MB configuration parsing library to embrace and extend "option=value".

Uh, TinyXML has a footprint of 40k, champ. Also, that's not what "embrace and extend" means.

So now why is this "vevent" class special, and who decided it would be "vevent" and not "scheduledevent" or "calendarevent" or "microsoftcalendarhassomethingforyoutodotoday"?

What a surprise, the guy who couldn't standardize on a DTD now fails to understand other format standardizations. Read the article, champ. It's not SlashDot's job to read for you, and this one's honestly pretty simple. Indeed, the specific purpose of microformats is to address your whining, but you don't see the point. Cough.

Clearly as a human I can look at "dtstart" and think about it and realize that this means the starting date, but how does a computer know this?

Er, by supporting a specific microformat. Are you putting in effort to be dense? It's the same way they support iCal, or MS Word files, or in fact any format at all, ever.

If the "semantic web" is going to take off, then we need semantics, and pronto.

This has nothing to do with the semantic web. You want to drop another? Ontological Web Language sounds important too. Use that one more often: fewer people will see through you.

God forbid the computer would have just one blank and assume that if you're billing Medicare then the number in the blank is probably a Medicare ID.

Yes, I'm sure the people billing Medicare who aren't using Medicare IDs will be greatly amused that your application just fails for them. Why is it that I don't believe you had much to do with the design of the system?

What's important in standardizing in semantics is identifying everywhere where things are identical and reusing semantics whenever possible.

"Semantics" aren't reusable. They're not arbitrarily applied. Please stop using words you fail to understand. Not every markup of data is semantic, even if the markup means something. Semantics are the work of understanding context, not identifying relations

--
StoneCypher is Full of BS
Re:Standardization is the problem by grcumb · 2006-07-11 15:40 · Score: 2, Insightful

" Then <lname> tripped over <lastname> which was crawling on the floor after being decked by <name last="Henry"/> who was rather pissed off after an argument with <name><last>Henry</last></name> "

"Yeah. And that's XML's fault how? Get a DTD and stick to it."

Well, actually, schema and RDF were supposed to address exactly that issue. So, in the opinion of the W3C, at least, it appears 'Get a DTD and stick to it' isn't the complete answer.

But that's a simplistic retort. The truth is that there are many cases (especially when individual business-to-business transactions are concerned) where 'Get a DTD and stick to it' is probably the right answer. It's simpler, if nothing else.

That's not the end of the conversation, though. There are a number of cases where future communications and permutations simply can't be known, and in situations like that, the option of sticking to a single DTD simply doesn't exist. In theory at least, schema and RDF supply the means to handle semantic translation of data.

'"Semantics" aren't reusable. They're not arbitrarily applied. Please stop using words you fail to understand. Not every markup of data is semantic, even if the markup means something. Semantics are the work of understanding context, not identifying relationships. Telling the difference between two kinds of ID code isn't semantic. Telling the difference between bug (insect) and bug (Volkswagon,) however, is.'

That may be true, but I remember very clearly listening to Tim Berners Lee introducing the Semantic Web in Toronto back in '99, and the example he used of how the Semantic Web would work showed A being determined to be semantically the same as C because A and B were known to be equivalent, and B and C were known to be equivalent as well. So while it's technically correct to say that semantics has nothing to do with translation, the promise of the Semantic Web is that one is able to translate between ad hoc data types precisely because their semantics can be inferred.

I won't comment on the effectiveness of schema and RDF in practice. Suffice it to say that no one's found many compelling (or at least popular) uses for either so far. That said, we still don't take advantage of much of HTML and CSS, so the problem may be PEBCAK (or just impatience) rather than poor design.

--
Crumb's Corollary: Never bring a knife to a bun fight.
Re: Standardization is the problem by martin-boundary · 2006-07-11 16:56 · Score: 1

Virgil is not his last name. His full name is Publius Vergilius Maro. So you would probably fill in the name field as Publius V. Maro.
Re:Standardization is the problem by Anonymous Coward · 2006-07-11 17:11 · Score: 0

Get a DTD and stick to it.

DTDs provide structure but no meaning beyond what humans ascribe to it. Sure, I can standardize on a DTD but the lname tag has no meaning to the computer beyond that of the mapping I created to the appropriate column in the database. Likewise, you claim that meaning is not reusable and are not arbitrarily applied, yet you chose to define bug as a Volkswagon, rather than as a vehicle, car, or automobile, all of which would be different to a computer armed with only strncmp().

Incidentially, if you're billing Medicare without the patient's Medicare ID number and it's not failing, I'd really love to know how you got that to work. I'm sure Medicare's insurance fraud people would love to know how you're doing it too.
Re:Standardization is the problem by stonecypher · 2006-07-11 18:42 · Score: 1

Yeah. And that's XML's fault how? Get a DTD and stick to it."

Well, actually, schema and RDF were supposed to address exactly that issue.

Schema is a replacement for DTD, because DTD has some subtle problems. RDF is actually for describing what's available on a service, not what's contained in one document; in a weird sort of way it's a conceptual parallel to the two for servers.

That all said, it's worth noting that XML considers its data type as a critical and un-removable part of the document. So, sure, you can use DTD, you can use Schema, you could use Relax-NG, whatever. The point is, the fault is the lack of an exchange standard, not a flaw in XML, and the exchange standard is the responsibility of the user.

That's not the end of the conversation, though. There are a number of cases where future communications and permutations simply can't be known, and in situations like that, the option of sticking to a single DTD simply doesn't exist.

This is a problem in the XML specification, because very few people read the W3C discussions that led to the standard's finalization (and, indeed, they shouldn't have to.) This is from XML's perspective considered a versionning issue, not a future-proofing issue; in theory the appropriate thing to do is to make your DTD available to versionning, and that's supposed to be the end of it. That's why doctypes require a version field and (although nobody ever checks it) a resource descriptor.

In theory at least, schema and RDF supply the means to handle semantic translation of data.

Like I told grandparent, this isn't a semantic issue. It's a shame people have begun to use the word to mean whatever's on their mind at the moment. Semantics have a very specific position within the context of web technologies: they are *solely* about interpreting data within context. Assuming a proper DTD, semantics are quite unnessecary. The semantic web is about making determinations which we typically suggest are the realm of humans. This is why I always use the bug (insect) and Bug (Volkswagon) example: it's really only about teaching the machine to tell specifically what we mean when we're dealing with homonyms, heteronyms, retronyms, metonyms, toponyms, and other things which may only be inferred from context.

The common example is that of a search engine. Google would be smarter if you could tell it you only wanted things about Champagne, the city, instead of the drink, the singer, the kind of wrestling or what have you. The semantic web is about that and only that problem. It has nothing to do with marking up a document for context, and indeed a well marked up document is far less needy of the semantic web.

That may be true, but I remember very clearly listening to Tim Berners Lee introducing the Semantic Web in Toronto back in '99, and the example he used of how the Semantic Web would work showed A being determined to be semantically the same as C because A and B were known to be equivalent, and B and C were known to be equivalent as well. So while it's technically correct to say that semantics has nothing to do with translation, the promise of the Semantic Web is that one is able to translate between ad hoc data types precisely because their semantics can be inferred.

You're confusing the semantic web and the ontological web. The former is the tool to support the discretion. The latter is the rule framework for actually performing the discretion. The W3C has a pretty good explanation of OWL on their page, and it's fairly common for the two topics to be wholly intertwined in discussion.

I won't comment on the effectiveness of schema and RDF in practice.

I will. Schema suck - they don't solve most of DTD's problems and cause a host of new ones in their wake. Prefer Relax-NG. RDF has potential, but we agree on that nobody uses it for anything genuinely interesting yet. We'll see whether they do in the long run; I'm of the opinion that it's not goin

--
StoneCypher is Full of BS
Re:Standardization is the problem by stonecypher · 2006-07-11 18:50 · Score: 1

DTDs provide structure but no meaning beyond what humans ascribe to it.

Uh, yeah, that's because that's what they're for. I said the reason he didn't have structure was because he didn't provide an appropriate structure document. Now you're trying to rebutt me by saying that they're really only for structure.

I fail to see the disconnect here.

Likewise, you claim that meaning is not reusable and are not arbitrarily applied, yet you chose to define bug as a Volkswagon, rather than as a vehicle, car, or automobile, all of which would be different to a computer armed with only strncmp()

You seem to have missed the point. The point is that you can't arbitrarily say that this bug is now also a Volkswagon; it is and it always was, if it ever was at all, and if it wasn't, it never will be. You can't copy semantics because they're unique and determinant. There is no germane relationship between two different bugs. Unlike DTD, there is no appropriate measure for standardizing context. The germane point is that a schema and semantics are in many ways need-exclusive: if you have one, the other is effectively useless. If you have a thorough schema for data, then you know perfectly well what's contained, and you don't need to mark it up semantically. If you have thorough semantics, then a schema is redundant.

Incidentially, if you're billing Medicare without the patient's Medicare ID number and it's not failing, I'd really love to know how you got that to work.

In many cases, Medicare will subsidize medical situations not already covered by insurance, even for those who aren't yet Medicare registered. A good example is addiction: the government provides a stipend to people with substance habits under the Americans with Disabilities act of 1976. This coverage is provided through Medicare as a matter of simplifying bureaucracy, and is not in fact limited to the elderly and mentally challenged; you can be a perfectly healthy young person with a cocaine problem, and get help. In those cases, Medicare will reimburse you by your Blue Cross ID, your social security number, or any of a host of other identifying numbers.

Indeed, Medicare does not require a Medicare ID. It simply requires an ID. The reason grandparent's software doesn't require Medicare charges to come from a Medicare ID is that neither does the government.

I'm sure Medicare's insurance fraud people would love to know how you're doing it too.

In general you shouldn't accuse people of fraud regarding a system you don't understand. It's offensive.

--
StoneCypher is Full of BS
Re:Standardization is the problem by MichaelMD · 2006-07-11 22:31 · Score: 1

its also easy to remember for anyone who has dealt with iCal data... the names used in hCalendar are basically lower case versions of the equivalent iCal names and are used for the same things - so that makes it easier to convert from hCal to iCal and vice-versa
Re:Standardization is the problem by cerberusss · 2006-07-11 22:50 · Score: 1

"Leverage"... *scratch*
"standard"... *scratch*
"XML"... *scratch*
"microsoft"... *scratch*

Bullshit!

--
8 of 13 people found this answer helpful. Did you?
Re:Standardization is the problem by Anonymous Coward · 2006-07-12 01:51 · Score: 2, Insightful

XML solves the parsing problem, not the semantics problem.

What parsing problem? Parsing is one of the most well-understood areas of computer science. Any comp. sci. graduate should be able to knock up a simple recursive descent parser, and there are dozens of good parser generators out there. It is the lack of semantics that makes XML little better than plain text — all the hard problems are left to applications.
Re:Standardization is the problem by Phreakiture · 2006-07-12 01:59 · Score: 1
Decisions have to be made up front as to what is the relationship between "name" and "last name" (people have a name, which has a last name, yet companies have names that typically don't have a last name.

Never mind company names; names of persons can be extremely difficult to parse. That which we call a "last" name is usually better described as a "family" name. Consider the following names:
- John Smith
- Wu Xue Jen
- Juan Carlos Jimenez Garcia
- Rev. Dr. Martin Luther King, Jr., PhD
All four of these names hae a thing that we would colloquially call a last name (referring to the family name), but only in the first case is it literally the last term in the person's name.

In the Chinese case (second item on the list), the family name is first. This lady's name is Xue, and her whole family's names begin with Wu. Further confusing the issue, she may, in migrating, "romanise" her name and go by Xue Jen Wu, or she may adopt a western name and go by something like Janet Wu or Jenny Wu or whatever strikes her fancy. It can be very difficult in this context to (a) determine which name is the family name, and (b) correctly determine that Wu Xue Jen, Xue Jen Wu and Janet Wu are all the same person (or not).

In the hispanic case (third example on my list above), the family name is second to last. Garcia is his mother's maiden name. Binomially, he would be known as Juan Jiminez. This can be more difficult to detect if he doesn't use his middle name, but does use his materno e.g. Juan Jiminez Garcia. You are left to figure out whether Garcia is his family name and Jiminez is his middle name, or whether Jiminez is his family name and Garcia is his materno.

All of this is without addressing prefixes (Mr. Mrs. Ms. Miss. Dr. Rev. etc), suffixes (I, II, III, Jr., Sr., etc.) or credentials (PhD, DDS, MD, DVM, PE, BSEE, BSCS, BTA, etc), which, of course, are why my fourth, and very well-known, example is there. And don't forget, of course, that "I" and "Sr." must be considered the same, and that "II" and "Jr." must be considered the same, since some who find themselves with stuck such a name (as I am) may switch back and forth as the spirit moves them.

I have two solutions to this, neither is wonderful.

First, you can redundantly record the person's whole name, along with a "goes_by" and "surname" field, e.g. Mr. Jiminez in my example above might be listed as: ('Juan Carlos Jiminez Garcia', 'Juan', 'Jiminez') to indicate that his family name is "Jiminez" and he goes by "Juan" (could also be very useful with Richard, Michael, David, etc, who might go by Rick, Ricky, Rich, Richie, Dick, Dickie, Mike, Mikey, Mickey, Dave, Davey, as well as being useful for people who go by their middle name)

The other option is to list the whole name as an array, and use indices to indicate which are the first and last name, e.g. ({'Juan', 'Carlos', 'Jiminez', 'Garcia'}, 1, 3). This uses less stoage space, but will likely be more difficult to search and less flexible.
--
www.wavefront-av.com
Re:Standardization is the problem by SporkLand · 2006-07-12 12:18 · Score: 1

I'm not disparaging, I'm genuinely wondering:
"Semantics are the work of understanding context, not identifying relationships."

Isn't the work of "understanding context" simply identifying the relationships between certain items in your data and other items. Which may involve discovering further relationships?

I'm not saying this to be a jerk, I really thought that semantics were derived by understanding the relationship between items.
Re:Standardization is the problem by Bogtha · 2006-07-14 04:22 · Score: 1

What parsing problem?

The problem of "I have a load of data that I need to be able to store and then restore into an easily manipulatable structure in memory."

Parsing is one of the most well-understood areas of computer science.

Just because it's a well-understood area, it doesn't mean data magically leaps out of files into data structures, does it? There's still a problem of actually implementing it.

It is the lack of semantics that makes XML little better than plain text

I assume by "plain text", you mean "ad-hoc format I cooked up on the fly"? You can't parse plain text unless you've solved the NLP problem - and that isn't "a well understood area of computer science", and proving very difficult for PhDs, let alone run-of-the-mill graduates.

XML is better than ad-hoc formats because you don't have to write a parser yourself - the problem is solved for you by a glut of libraries for all kinds of different systems. Furthermore, there's all kinds of different software that can manipulate XML in various useful ways. Why do work you don't need to? Just because something does [x] + [y] and not [x] + [y] + [z], it doesn't mean that it doing [x] and [y] isn't useful.

--
Bogtha Bogtha Bogtha
Re:Standardization is the problem by stonecypher · 2006-07-15 10:37 · Score: 1

Isn't the work of "understanding context" simply identifying the relationships between certain items in your data and other items. Which may involve discovering further relationships?

No.

Basically, the issue is this. Semantics are specifically the case of attempting to discern the meaning of a word given its usage. When you have something that says "anything in this column is a FOO," then there's no need for semantics: usage is moot, as the meaning of what's in that column is absolutely described. Semantics are a purely natural-language concern. They do not occur in programming at all, ever. Programming languages work entirely on syntax and grammar.

Syntax is, given a fragment of a sentence, the set of rules governing what is allowed in the next step in the sentence. To use a natural-language example, "I went to the ---." English syntax suggests that what's in the --- must be a noun, a context-aware specific adjective (such as "front" or "top,") or a constraint ("best," "largest" or so on.) Yes, there are some weird dodges, but the important thing to understand is that syntax is the set of rules that says "when you write that sentence, filling in the words 'purple,' 'without' or 'twelve' are illegal."

Grammar is the set of rules governing the placement, conjugation or usage of words in order to communicate extra meaning. To use a natural-language example, you may conjugate the name "Joe" into "Joe's" to indicate something belonging to or characteristic of Joe. You may use "-est" to denote that the conjugated word is a limit, such as "greatest" or "coldest." You may move the direct object to the end of the sentence, to indicate that the middle of the sentence is subordinate to that clause, such as "I thought that he, while afraid of the Chinese culture, would go to Beijing to save the business account."

In programming, these things are easier to explain with errors. In C++, "for (int i=0; i" is a syntax error. It is easily understood to a human to be a faulty attempt to set a second constraint on the for loop, but since C++ doesn't allow that structure, it's a syntax error. A grammar error might include trying to declare an anonymous class in the for loop's declaration. Semantic errors are a little more difficult. Semantics are about inference. Since they don't occur in programming, to show them in computing is hard; I'll need to refer to fact-processing systems (aka expert systems or knowledge engines) like Prolog or Cyc, and to construct an example, in order to make my point. I'm sorry; I've tried to give a simple definition several times, and I am having a hard time doing so in a way that I don't feel is easy to misunderstand. Consider the case that you're writing a fact system for the Volkswagon company. This fact system is intended to create a large dataset describing what is known about Volkswagon's tool software development process. For the purposes of this example, we'll just pretend they're having a hard time creating milling and die-tooling software on budget, and that they're building the fact system to try to figure the problem out; it's actually a pretty common and useful practice in large industrial environments. I don't expect they're having any real such problems, but let's pretend. So, the first thing they would need to do was to teach the system about the process. This means teaching it about the Jetta and the Golf, what's involved in making one, how long this takes, how much that costs, and so on. Then they have to teach the system about the robots that do the actual assembly, what their fault tolerances are, how much maintenance costs, how long it takes, what the impact of being down for a certain amount of time is on the greater system, etc. Then they need to start teaching the system about software conditions like (say) function points, what their error rates are, what the average cost of failure has been, and so on. Now, let's say that we started this project six years ago. The next year, the ne

--
StoneCypher is Full of BS
Re:Standardization is the problem by stonecypher · 2006-07-15 10:40 · Score: 1

Forgetting to close the on code is for the lose. Sorry about the eye-pain.

--
StoneCypher is Full of BS

I don't get it... by grumbel · 2006-07-11 12:27 · Score: 4, Insightful

Ok, so this "microformats" thing is about encoding extra data inside an HTML file by abusing CSS class names for markup, isn't that completly unnecessary and nothing more than an ugly hack? Don't we have XML namespaces for exactly that reason? Wouldn't something like: <span style="display: none"> <vevent:event> <vevent:dtstart>20060501</vevent:dstart> <vevent:dtend>20060502<vevent:dtend> <vevent:summary">My Conference opening</vevent:summary> <vevent:location>Hollywood, CA</vevent:location> </vevent:event> </span> We the 'right'[tm] way to day it?

Re:I don't get it... by Karma+Farmer · 2006-07-11 13:22 · Score: 5, Informative
The class attribute was never intended to be limited to CSS. From the HTML 4.01 specification:
The class attribute... assigns one or more class names to an element; the element may be said to belong to these classes. A class name may be shared by several element instances. The class attribute has several roles in HTML:
- As a style sheet selector (when an author wishes to assign style information to a set of elements).
- For general purpose processing by user agents.
Re:I don't get it... by Anonymous Coward · 2006-07-11 13:43 · Score: 0

Not all microformats use the "class" attribute. See rel-tag, rel-license, etc.
Re:I don't get it... by jandrieu · 2006-07-11 13:52 · Score: 2, Insightful

Your technique hides the semantic data from normal view and forces the author to replicate what they don't want hidden.
With microformats, the data is presented once, with a few simple tags, and is then available to both HTML viewers/users and semantic parsers.
Re:I don't get it... by Anonymous Coward · 2006-07-11 14:45 · Score: 0

No, you're still not getting it.

The ugliness of this "microformat" thing is that it's shoehorning the function of elements into attributes: you have an attribute (in this case, the "class") that determines what the element really represents. WTF? This is best represented in XML: make an XML document type that specifies all these elements properly.

I mean, why do you need HTML for this at all? Your browser can display XML documents if you have an appropriate stylesheet (which these "microformats" also need anyway).

Namespaces are not the answer: they're just a hack on top of XML. When namespaces were invented, the issue of document *validity* was pushed to one side, to be resolved later (never was). So as soon as you start using namespaces you lose the benefits that XML was designed to deliver: clearly defined document structures that can be validated.
Re:I don't get it... by stonecypher · 2006-07-11 15:00 · Score: 1

Well, actually it's what XHTML is for - namespaces are just to prevent name conflicts, like namespaces in C++. Sure, XML is for custom markup, but Microformats are about embedding formats, not creating them. It's a subtle, and some would contend, pointless difference; that said, given what you said, I'm willing to bet you'll see the importance.

But, yes, you're right to point out that the buzzword web is reinventing yet another tool needlessly and badly.

--
StoneCypher is Full of BS
Re:I don't get it... by mk_is_here · 2006-07-11 15:10 · Score: 1

Why not XSLT ? Create a self-defined XML, then attach it with a XSLT Template And we dont even need a server side script to make it work!
Re:I don't get it... by jandrieu · 2006-07-11 15:56 · Score: 1

Your question implies the answer.
XSLT is a new language to learn. Defining your own XML can be tricky. Integrating it on the server takes some effort and if you want that transform executed on the browser, it certainly will NOT work with as many clients as plain ol' HTML or XHTML.
Microformats OTH exists as socially defined semantic packages based on real world usage (meaning they've been through the ringer and had the bugs worked out, mostly). The author doesn't have to define their own language or learn a new one, they simply use the (X)HTML and CSS they already understand with a few simple tags and their page is now part of the semantic web and works with all modern HTML browsers.
In short, it is simpler for those common cases that fit in existing microformats.
And still no server-side script...
Re:I don't get it... by mk_is_here · 2006-07-11 17:53 · Score: 1

How tricky it is to self-define a lightweight XML format? Use whatever element you like, and design the data structure on your own that suits you best. Why do we need to design a new language?

How is it different from calling a server to output XML and output HTML/XHTML? Which modern browser today does not support XSLT? Firefox, Internet Explorer? (Yes, Opera will support XSLT 1.0 in the coming version 9)

BTW, There are server-side XSLT processors (for very-old browsers sake). For instance, this, this and this.

And finally, what's the point to make the document semantic if the browser ignores it?
Re:I don't get it... by jandrieu · 2006-07-11 21:47 · Score: 2, Insightful

*Any* design activity is more complicated than copying a proven, open source design. And if you want that design to be understood by someone else, you still need to (correctly) use a common vocabulary.

It is easier to use what you know (HTML+CSS) and rely on the technology you understand (IE/Firefox/etc). That's it. Some people like to play in new techno sandboxes. Others just need to publish their kid's soccer schedule on their webpage and aren't about to read the help files at their ISP--or sourceforge or the W3C or where ever--about how they install, configure, and use that XML/XSLT stuff. And given how vendors like to extend the functionality of "standards-based" technology, I expect it will take about as long for XML/XSLT to settle as it did for HTML. And if you've ever worked with HTML developers learning XML, you'll see how frustrating it is to transition from the extremely forgiving realm of HTML to the rigor of XML.

Easier is better for many.

The point is not for browsers to ignore anything. Browsers (or extensions) will/are build/ing in tools to respond intelligently to embedded microformats. Microformats make it easy to transform content that would otherwise be thrown up in basic HTML+CSS, so that it is semantically accessible for those systems that are looking for it.

Its a pretty straightforward premise that the easier a technology is, the more people will use it, assuming there is value for doing so. If you still want to develop your own XML and write XSLT to generate HTML, go for it. If you think more people would rather learn XML/XSLT than use the HTML/CSS they already know plus a few microformats, then there isn't much more I can say.

-j
Re:I don't get it... by Fastolfe · 2006-07-12 02:51 · Score: 1

but Microformats are about embedding formats, not creating them

It seems to me that creating them is exactly what this is about. Taking a step back, what they're saying is, "XML is hard. But if you make up a pattern of HTML elements and reserve some class names, programs can parse out information in standard ways."

This is the same problem that XML namespaces were intended to solve. OK, so this works for a handful of "formats". Clever (and planned) use of CSS gets the data displayed and compatible user agents can more readily parse information out of it. But this solution doesn't scale! Eventually you're going to get "formats" that start to step on each other's toes. They use the same class names, or the same pattern of elements. Maybe you want an "event" but want to add some supplementary information about that event using another "format". Do they mingle together?

If IE can get off of its ass and properly support XHTML, this problem is already solved. Create your event in XHTML, and supplement it with XML tags or attributes from other XML namespaces to include the machine-readable information. If you're concerned about how to style this XML data, remember that CSS can style (or, by extension, hide) XML just fine.
Re:I don't get it... by stonecypher · 2006-07-12 05:26 · Score: 1

It seems to me that creating them is exactly what this is about.

Like I said, it's a subtle difference, and I don't expect most people to get it.

Taking a step back, what they're saying is, "XML is hard. But if you make up a pattern of HTML elements and reserve some class names, programs can parse out information in standard ways."

I've never seen anyone say that. Indeed, these are no different than XML itself, and are in fact valid XML. Please show me someone saying the words "XML is hard," or any actual evidence in that direction.

This is the same problem that XML namespaces were intended to solve.

No, it isn't, as I just said in the post you replied to. You can very easily read the W3 discussion that led to the creation of namespaces; they have nothing to do with any of this. Namespaces serve exactly one simple purpose, and it's the exact same simple purpose they serve in C++; indeed the parallel in C++, and the discussion that led to namespaces in C++ in the ISO communit, was very heavily leaned on in the W3 discussions. Namespaces in XML were created solely for the purpose of preventing name conflicts. This is something well documented and easily researched. You really need to not claim the underlying motivation for a tool when said motivation is well documented and contrary to your claim.

But this solution doesn't scale! Eventually you're going to get "formats" that start to step on each other's toes. They use the same class names, or the same pattern of elements. Maybe you want an "event" but want to add some supplementary information about that event using another "format". Do they mingle together?

You seem to be re-iterating questions that were already discussed by other people. Please actually read the discussion tree before engaging in it. This has all been explained and resolved. The reason you think this problem isn't well solved by the solution is that this just isn't what the tool is meant to solve. Similarly, a Honda isn't very good at baking a cake. Someone who doesn't understand what a Honda is for might think they're for baking cakes, since there's an enclosure which builds up a well-controlled temperature which (by revving the gas) can be altered by the person running the car. But, it doesn't "scale well" (cough) because the car has too much ventilation and the cake starts tasting like exhaust.

If IE can get off of its ass and properly support XHTML, this problem is already solved.

IE already supports the part of XHTML that deals with these concerns. It has since IE4. Perhaps you should try it. The things IE has trouble with in XHTML are things like the underlying MIME types and DTD verifications, neither of which are germane here. If you build custom tags in IE, they work just fine. They have for almost 8 years, since before XHTML was even considered. On this topic, IE is in fact way ahead of the curve.

Indeed, it's not at all difficult to dig up MSDN examples of doing exactly these things from 1998.

Create your event in XHTML

Event? What do events have to do with anything?

If you're concerned about how to style this XML data

Nobody was concerned in that way.

remember that CSS can style (or, by extension, hide) XML just fine.

If you'd read the discussion tree you'd realize that we all took that for granted. Will you next tell us that if we're concerned about getting < > in the document that we can use entities?

Please don't join discussions unless you're willing to figure out what people are talking about. It's remarkably rude and conceited.

--
StoneCypher is Full of BS
Re:I don't get it... by Fastolfe · 2006-07-12 07:45 · Score: 1

I do not appreciate the condescending tone. Just because someone disagrees with you does not mean they are not literate or not paying attention to the discussion. Reasonable people can disagree reasonably.

Please show me someone saying the words "XML is hard," or any actual evidence in that direction.

My comment was not intended to be a literal quotation.

From the article:

You see, for a while now, people have tried to extract structured data from the unstructured Web. You hear glimmers of these when people talk about the "semantic Web," a Web in which data is separated from formatting. But for whatever reason, the semantic Web hasn't taken off, and the problem of finding structured data in an unstructured world remains.

He's referring to the "semantic web" generally, which most followers of the semantic web interpret to mean XML and the family of "semantic" markup languages built with XML.

The quoted article says this approach hasn't caught on. I rephrased that, with some creative license, as saying, "XML is hard." I don't think that's an inaccurate characterization. But it's also largely irrelevant to the point I was trying to make.

Namespaces in XML were created solely for the purpose of preventing name conflicts.

I completely, 100% agree. Perhaps I misspoke when I referred to namespaces by themselves. Namespaces themselves do not contain any semantics. The various XML languages do, however. XHTML has its semantics, and My Markup Language contains a completely different set of semantics. How do you embed My Markup Language semantics within XHTML? Namespaces. You don't overload HTML elements and assign new semantics to HTML tags that already have semantics. You don't set yourself up for collisions when two "formats" want to overload the same pattern of elements and attributes. That is what namespaces are here to solve.

The reason you think this problem isn't well solved by the solution is that this just isn't what the tool is meant to solve.

This doesn't change the fact that the problem exists. Both namespaced XML and "microformats" allow the embedding of additional (arbitrary) semantics within another type of document. If microformats inherently don't elect to deal with intermingled data, isn't that just another way of saying it has a deficiency?

IE already supports the part of XHTML that deals with these concerns. It has since IE4. Perhaps you should try it.

IE does not support XHTML unless it's transformed from XML. IE supports HTML tag soup, and it supports raw XML. You can either tell IE that your XHTML is HTML, in which case IE will interpret it as HTML tag soup, or you can tell IE that it's XML, in which case IE will treat it as raw, unformatted XML (no HTML semantics). If you have a piece of XHTML content that you desire to deliver as XML (application/xhtml+xml or some other XML media type), your content will not be interpreted as XHTML in IE, because IE does not support XHTML. You can either deliver it as text/html, in which case XML-aware applications strictly honoring media types will not be aware of your additional XML data, as application/xhtml+xml, in which case your XML-aware applications can extract information from it, but the page is unreadable in IE, or some XSLT modified version of your content that allows IE to transform it to XHTML. This was the problem I was attempting to discuss. With proper XHTML support, documents could be created in true, validating, standards-compliant XML/XHTML, and not only would XML-aware applications be able to extract useful information from it, but it would render properly in popular browsers.

Event? What do events have to do with anything?

I was continuing with the example given in the article.

It's remarkably rude and conceited.

And how do you think your post looked? Please drop the attitude.

OK. Who else... by frank_adrian314159 · 2006-07-11 12:35 · Score: 2, Funny

Who else read this:

If the "semantic web" is going to take off, then we need semantics, and pronto.

as:

If the "semantic web" is going to take off, then we need semantics, and porno.

--
That is all.

Re:OK. Who else... by Anonymous Coward · 2006-07-11 19:56 · Score: 0

If the "sementit web" is going to take off, then we need sementits, and porno. Duh.
Re:OK. Who else... by frank_adrian314159 · 2006-07-14 17:27 · Score: 1

Oooooh! Somebody woke up cwanky this mowning. What's the matter? Bad bottle of milk?

--
That is all.

Wheel of re-incarnation strikes again... by sreekotay · 2006-07-11 13:52 · Score: 2, Informative

Mixing presentation and data - good... bad... good. But it gets better a little, each time (maybe more of a spiral than a wheel).

We're using them on aim pages for module development (I cover it a bit here). Its a nice simple standard, and the idea needed SOME name - don't make more of it than it its.
-----
graphically speaking

--
graphically speaking

History, failures, doomed to repeat by ekhben · 2006-07-11 14:20 · Score: 5, Insightful

This is a kind of neat idea, except, of course, if I have CSS that does something with, oh, say, a class of "dtstart". Sure, it's easy to recognise that ".vevent > .url > .dtstart" is a microformat data item for an hCalendar, but if I'm already using "dtstart" or "url" regularly in my markup so I can apply styles to those kinds of things, I'm pretty much SOL. Rewrite all your markup and CSS to stop using those names.

There's no namespacing. There's not even an ATTEMPT at namespacing. This will fast become an unmanageable hodge-podge of insanity, with common words used willy-nilly in class attributes.

The class attribute is defined as CDATA. That's it. You can use pretty much ANY character in it. There's a lot of characters that can't be used in a CSS selector, though, such as ":". See where I'm going with this? <div class="mf:vevent"> for a start. Better yet, <div class="hidden mf:vevent"> such that you can hide (or format) the block of data separately.

Now, as if that wasn't bad enough, and, trust me, it IS bad enough, there's also the misuse of the "title" attribute and the "abbr" element. A machine formatted date is not the expanded version of a human formatted date, which is not an abbreviation. A renderer trying to make sense of <abbr class="dtstart" title="10034134134T00">17th Smarch</abbr> will think "AHA! This here is an abbreviation, I will provide unto the user some means to see what that '17th Smarch' abbrevation stands for!" Usability disasters follow.

So, in summary, this is the worst idea I've seen in HTML space since some bright spark said, "let's suggest that people use the 'text/html' content type for their XHTML markup!"

Re:History, failures, doomed to repeat by Anonymous Coward · 2006-07-11 17:12 · Score: 0

There's , but there don't seem to be many people using microformats taking advantage of it.

I agree with you and think that RDFa is a much more robust way of acheiving the same goal, but small steps are better than none.
Re:History, failures, doomed to repeat by Anonymous Coward · 2006-07-11 20:10 · Score: 0

I agree with you completely about the namespace issue. For my own work, when using JavaScript behaviours, I always preface them with the object name and then a period:

div class="behaviour.method"

but microformats dont even attempt this! As you rightly point out, in order to encapsulate the data in a manner that scrapers/readers can understand, abusing the abbr element and the title attribute is appalling! Seriously, I have no faith if all this hard work on standardising, applying sematics and accessability to page is ignored and abused in order to provide access to data which, really, should be free of all this junk (from the readers point of view).

Create an XML feed. Stop polluting the page.
Re:History, failures, doomed to repeat by thePowerOfGrayskull · 2006-07-12 02:55 · Score: 1

There's no namespacing. There's not even an ATTEMPT at namespacing. This will fast become an unmanageable hodge-podge of insanity, with common words used willy-nilly in class attributes.
Human sacrifice, dogs and cats living together -- mass hysteria!
Re:History, failures, doomed to repeat by jt2190 · 2006-07-12 06:04 · Score: 1

This is a kind of neat idea, except, of course, if I have CSS that does something with, oh, say, a class of "dtstart". Sure, it's easy to recognise that ".vevent > .url > .dtstart" is a microformat data item for an hCalendar, but if I'm already using "dtstart" or "url" regularly in my markup so I can apply styles to those kinds of things, I'm pretty much SOL.
Not necessarily. If the existing style rules don't look ugly when applied to the microformat then no problem. Otherwise, do exactly what you said: Add style rules for .vevent > .url > .dtstart and for .vevent > .url

HoTMetaL by Doc+Ruby · 2006-07-11 14:45 · Score: 2, Insightful

And I think that muddling data and presentation without explicit distinction is exactly what was wrong with HTML. Which we just spent a decade slightly recovering from. I guess IBM has made a lot of money on crappy tools, good tools to extract data from crappy data, and extra money for doing it right.

--

--
make install -not war

Pingerati from Technorati by otisg · 2006-07-11 14:57 · Score: 1

The VERY relevant site that Jack Herrington forgot to mention there is Pingerati. That is THE site through which all these Microformats are shared. The system is based on pings, much like the rest of the blogosphere. Both Pingerati and Microformats have a major force behind it - Technorati.

--
Simpy

hResume and Emurse.com by arudloff · 2006-07-11 16:41 · Score: 1

We're looking to implement hResume on Emurse.com web resumes here in the next couple of days.

I'm really excited about being able to push the standard some. We've been wondering what the effects of it could be negatively though, in terms of screen scrapers (alex.emurse.com, for instance). Any one have any thoughts?

We've built hResume support to be configurable by the user, if it proves to be an issue. Just wondering how we should initially offer it.

Such crap by Anonymous Coward · 2006-07-11 16:42 · Score: 0

HTML,DHTML,XML,XHTML,XML etc. etc. add freeking nausium, uhhhhhg!

This is turning into PURE alphebet soup and thus into pure GARBAGE! CSS,

CSS2, CSS3 and more garbage yet to come, I am quite sure.

How about textbox(OrgPoint,EndpointXY,Font,Color,data) called like:

TextBox('10,10','100,100','arial','Red','Hello world!')

Or lets do it one better?

How about textbox(OrgPointXY,EndPointXY,Layer,Font,Color,dat a) called like:

TextBox('10,10','100,100','1','arial','Red','Hello Nurse!')

Or how about:

Image(OrgPointXY,ImageName,ScaleFactor,Layer) called like

Image('0,0','HotBabe.jpg','100','1');

Hmmm lets see the browser would render the image of the hot babe and the render the text 'Hello Nurse' on top of it!

WOW! Now how many lines of HTML & CSS would I have to write to do that?

The problem with the web is it has been designed by a bunch of academics who do not have to do real actual work aside from getting papers published.

Publishing to the web could be made easier by an order of magnitude by that one simple concept; being able to put something where you wanted it, absolutely, with a direct statement.

Ohh you want a scroll bar for that text box? Howabout:

How about TextBox(OrgPointXY,EndPointXY,Layer,Font,Color,Dec oration,data) called like:

TextBox('10,10','100,100','1','arial','red','VScro ll,HScroll',data)

Imagine how much faster a broswer would be if it didn't have to parse a few thousand lines of CSS.

KISS!

I Was Going To Say... by Carcass666 · 2006-07-11 16:51 · Score: 3, Interesting

I was going to say "I Don't Get It" but somebody beat me to it.

I think the title of TFA "Separate data and formatting with microformats" is a bit ironic since it's about wedging your data into a web page in such a fashion that somebody might be able to pull it back out.

If you want to make your data available there are all sorts of standard and more efficient ways of doing it than embedding it in the presentation layer. If somebody is going to all the trouble to create a parseable human-readable page, why wouldn't they go to about the same amount of trouble and make a far more efficient and standard RSS feed? What about the buzzword of the last few years, SOAP? Hell, what about XML?

From TFA:

How great is that? I have one script that reads a page with calendar items and exports it as XML. Then, I have another page that turns that XML back into calendar items. The original script can then read that page and come out with the same data. It's definitely a circular action.
Okay, maybe it's not that great.

I agree. This reminds me of the lame number tricks where you have somebody pick a number, add something, multiply it by something, blah blah blah, you take the result, divide it by 7 and then you give them their orignal number because you had it all set up ahead of time. If they screw up in their calculations, the trick doesn't work. In this thing, if you screw up embedding the text within the HTML (plenty of ways to do that), the trick doesn't work - and doesn't accomplish much even if it does.

JSON (Javascript over the wire) by c0d3r · 2006-07-11 17:13 · Score: 2, Informative

Look into JSON..its basically javascript data structures that you eval on the client. Why bother assembling thick XML that needs to be parsed on the client. XML is slow, and even slower if you have to XSLT it out of the XHTML.

I'll get it. by rodentia · 2006-07-11 19:55 · Score: 1

I don't believe it was intended to contain an alias (in Sowa's sense) or a general nomenclatura, however. This innovation actually undercuts the *semantic web* fairly radically, by confusing names with types, proper nouns with classes, as discussed in the second chapter of his Knowledge Representation.

XML, as pointed out clearly elsewhere in the thread, is a conventional syntax for the representation of heterogeneous schemata. An XSL stylesheet is a deterministic means of defining the relationship between such schemata and mediating their discrepancies and gaps.

This method seems to be a social convention relying upon some contemporary user-agent (and user) behaviors. The article itself apparently conflates the functional separation of data and formatting with a system of semantic definition; though we can credit the author for recognizing this and other shortcomings in the article ("This code looks a bit complicated . . .") A far cleaner method by any measure is to mediate the relationship between domain semantics and presentation or syndication semantics via a SAX-driven XSL transform performed by either the client or the server.

--
illegitimii non ingravare

sufficiently complicated by rodentia · 2006-07-11 20:06 · Score: 1

Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp. -- Phillip Greenspun's 10th Rule of Programming

--
illegitimii non ingravare

And he asked for a wake-up call . . . by rodentia · 2006-07-11 20:11 · Score: 1

when browsers have built in support.

--
illegitimii non ingravare

We have this, only IE does not support it. by Jerk+City+Troll · 2006-07-11 23:04 · Score: 1

It appears you were thinking about the data URI scheme. Unfortunately, and very much like modern CSS standards, the only browser to not support it is the one with the greatest market share.

--
Join Tor today!

a standard that people are ACTUALLY USING! by MichaelMD · 2006-07-11 23:44 · Score: 1

exactly! sure the idea of using css class names to represent something for a machine to read is not new as it is an obvious one. I thought of it too when I first saw CSS used just like I thought of using made-up tags to represent things when I first saw html ... but THAT IS NOT THE POINT - - the STANDARDISATION, the fact that LOTS OF PEOPLE ARE ACTUALLY STARTING TO USE IT, and the SIMPLICITY is what makes microformats interesting - For someone like me who has been looking for many years for ways to make it easy for an events promoter to provide machine readable data for a nightlife listings website ( www.spraci.com ) without needing to provide them with special software and then having to teach them how to use it, its an exciting thing! Sure the preferred way to add an event is to use the forms on the site - but not all promoters have the time to do it and some may already have their events listed on their own sites - why should they have to enter the same data over and over to get it listed on a few listings sites? ... see the problem? You might ask "what about RSS?" .. think about it ... Events listings are calendar data - they need DATES ... plain old rss does not do that .... (unless you use extended versions like RSS+Event - but not much software out there uses that - so that inevitably means people need to modify their software - not much good for most event promoters!) spraci.com and many other listings sites require event dates to be seperate and machine-readable because people can look up events by date. "What about iCal?" Is there a way to represent cities/countries/etc in iCal? Listings sites that deal with more than one city need that kind of information. If you use hCalendar you can combine it with hCard to specify the city/country! For some of us who have been trying to get data syndication of this kind happening for years and having to deal with a lack of standards and software using them that is suitable for the average event promoter to use I see microformats as a very good thing. 1. they are easy for people to understand and use without needing to spend hours reading documentation to figure out the basics of what it does... a simple example is almost self-explanatory 2. not hard to parse with very basic xml/html/etc tools - you don't need anything exotic or overly bloated. 3. lots of people are actually already using it - that is pretty rapid uptake! (what use is a "standard" if nobody is using it?) 4. it is actally trying to addresses the real world situation in a real world way. - html is everywhere - people want to create and consume data feeds containing data not handled well by plain old rss - people also want to embed data in other places where they might be using html - people want the minimum of installing or modifying software to do it - they want it NOW with a minimum of fuss - there might be more than one item to be represented on one page (that pretty much rules out using meta) - it tries to work with other existing standards where possible (eg hCalendar is based on iCal / hCard is based on vCard) yes do check out http://microformats.org/wiki/ ...and if you are still not sure check out some of the links on there to other sites using microformats for more real-world examples.

XML can be styled by Fastolfe · 2006-07-12 02:44 · Score: 1

If the parent document is XHTML, and the browser understands that, CSS can easily be used to style these additional non-XHTML elements any way you like.

I don't see how this is better than XML/XSLT. by poot_rootbeer · 2006-07-12 02:45 · Score: 1

This "Microformatting" concept is predicated on the idea that data is (or should be) human-readable in its default state, but with mechanisms that make it easier to translate it into something machine-readable. This seems backwards to me.

Humans only need to be able to comprehend the data structure at two points: input and output. In between, computers may perform a thousand different transfers and transformations on the data, and at those points, the ability to see the data in plain English (or plain Anyotherlanguage) is just excess baggage.

He mentions Webmonkey and Technorati as computer services which essentially work by screen-scraping content intended for humans and hacking it into something for computers. This is not to be encouraged.

The XML output of the author's sample transformation seems like a more logical default storage format for the data. It's easy and flexible to transform this data back into any format desired, and certainly easier than transforming from "Microformatted" XHTML to intermediate XML to target format.

Re:I don't see how this is better than XML/XSLT. by MichaelMD · 2006-07-12 16:46 · Score: 1

>This is not to be encouraged.

so if you had your way we wouldn't have search engines like google, etc either?

Slashdot Mirror

Independent Data and Formatting with Microformats

99 comments