Online Comics Syndication in XML
gravling writes: "Jason McIntosh has written an interesting article on XML.com about ComicsML, a language he's invented to allow online comics artists to describe and syndicate their work. Using ComicsML can let you do similar things to the UserFriendly search engine, but on a web-wide basis."
User Friendly, eh?
Usar Freindley, Lunix friend.
(Yuo are WORST comic evar.)
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Despite the press, XML is NOT that easy to parse. The same hassles we experience with HTML parsers are magnified tenfold.
I think I speak for anyone who has actually written an XML parser when I say... What?
Of course XML is easy to parse. The difficulty in parsing HTML derives from it being widely abused. You can't rely on HTML to be well-formed when browsers like IE literally don't require you to close any tags you open (Closing a _table_ is optional, even. Whose bright idea was that?) In contrast, omissions like that simply aren't an option in XML. If your document isn't well formed, the parser won't try to parse it. End of story. (And if the parser does try, the parser is broken) Incidentally, people that write applications that use XML aren't writing the code to validate if a document is well-formed. If they are, they're wasting their time. Use a library, there are plenty of them, for virtually every popular language.
Now, whether or not the document conforms to a DTD, yes it's somewhat silly to post your DTD on a server that isn't readilly accessible. And we all know there's no such thing as 100% uptime, so what's the Right Answer?
So it's not the holy grail. Only a fool would say it is. But it's a much better option than everybody just making up their own (often binary) formats for describing things, because it sets the ground rules.
Ten-fold, eh? I'd love to hear specifics.
> But your comment on Guile got me thinking; probably it is overkill. But perhaps the flexibility of a full programming language would be beneficial for configuration, although it may not be meant to solve the same problems as XML.
f ))));
> Your post got me to (re-)look into Guile, but I was wondering if you (or anyone) had any more specific thoughts on what formats to use for configuration files, and what in particular you do with Guile that replaces what you would have done with XML.
I'm not a guru at either Guile or XML, and my use of Guile is evolving pretty quickly now that I have started using it regularly. For now, this is how I use it for configuration files.
I have a "data type" that I call a table, which is of the format (key data). The data can be more tables, giving a tree structure to the configuration, or it can be bottom-level data, serving as the leaves in a tree of data. So a simple example of a configuration file for a fictitious game would be:
[Sorry; I had to remove the indentation to get it past Rob's lame-o lameness filter. Lack of indentation really reduces readability.]
(configuration
(difficulty-level 7)
(sides
(good
(description "The Good Guys")
(restrictions no-nukes no-poison)
)
(bad
(description "The Bad Guys")
(restrictions none)
)
(ugly
(description "The Ugly Guys")
(restrictions no-teeth)
)
)
(graphics
(size (x 550) (y 490))
(theme penguins)
(images "mytiles.xpm")
(animation-speed
(chase-scenes 12)
(love-scenes 3)
)
)
)
In the "tree" metaphore, configuration is the root, difficulty-level, sides, and graphics are the first level of branches, etc., on down to the leaves where the sub-tables terminate in atomic data.
For simplicity, the following uses pseudocode rather than the actual Guile syntax.
You declare appropriate variables of the Guile SCM type and then use Guile's read to load the configuration file into your program as a Scheme object (without trying to evaluate it as a Scheme expression).
conf=read("~/.mygame/myconfig.scm")
Since you are using tables, you use Guile to define a function lookup(keyname,table) that converts the key-name string into a Guile symbol, and then looks it up in table table:
grap=lookup("graphics",conf)
speeds=lookup("animation-speed",grap)
tmp=lookup("chase-scenes",speeds)
...do whatever with the value...
tmp=lookup("love-scenes",speeds)
...do whatever with the value...
Your program just runs down the tree like that, looking for whatever data it needs. When you get to the bottom, you use a Guile built-in to convert the data to an integer, string, or whatever your program expects.
Some data can be iterative, too. In the example, sides is a list that you can modify in your config if you want to define more player sides (say, for AI opponents). Your configuration reader just uses your lookup to find sides, and then iteratively loads one side record at a time until you run out of definitions. (In Scheme terms, you process the car with your lookup function, throw away the car, and continue your iteraton on the cdr.)
The lookup function is really easy to define, and you put it in your library directory so all your programs can use it. It just converts the keyname string to a Guile symbol, and then uses the built-in assoc to find that symbol as the key for anything in the cdr of table. If all your data is in the table format, it works to look up anything, working down the tree recursively. For instance, if you wanted the y size and didn't need anything else, you could do:
y=lookup("y",lookup("size",(lookup("graphics",con
You can also easily define a recursive check_table that verifies that something you loaded is in fact a table structure, in order to trap errors early if a user has screwed up his config file.
The only things I don't use lookup for are the iteration as described above (but even then I use lookup to find the iterative definition, and then use it again to parse each element in the iterative structure), and to get the bottom level data out of the "leaves", e.g. to parse:
(y 490)
I have a library function get_int that accepts a leaf of the form (key integer), extracts the second element, and converts it from a Scheme integer to a C integer, and similarly for other atomic data types.
Also nice, Guile does garbage collection, so you can use it to splice things together out of the configuration and throw away the husks without having to explicitly collect all the trees of objects that you created.
--
Sheesh, evil *and* a jerk. -- Jade
Perhaps the specification could include something that makes sense of incomprehensible comics like Zippy the Pinhead and its imitators
Honestly, Zippy isn't that hard to get if you realize that essentially every joke is based on a reference to pop culture, generally from the '60s or '70s. Of course, there's lots of times that I "get" the joke, but don't find it particularly amusing.
At least two:
"HELLO KITTY gang terrorizes town, family STICKERED to death!"
(Personally, I was exposed to Zippy quotes before seeing the actual Zippy comic strip. I had been hoping that most of the quotes would actually make sense when taken in context.)
<PANEL>
</PANEL>
<CHARLIE_BROWN ACTION="RUNNING"></CHARLIE_BROWN>
<LUCY ACTION="HOLDING_FOOTBALL"></LUCY>
<PANEL>
<CHARLIE_BROWN ACTION="RUNNING"><THINKING_BUBBLE TEXT="I'm going to kick it this time!"></THINKING BUBBLE></CHARLIE_BROWN>
<LUCY ACTION="HOLDING_FOOTBALL"><GRIN STYLE="MISCHEVIOUS"></GRIN></LUCY>
</PANEL>
<PANEL>
<CHARLIE_BROWN ACTION="FALLING"><SCREAM TEXT="WAUUUGGHH!!!"></CHARLIE_BROWN>
<LUCY ACTION="YANKING_FOOTBALL"></LUCY>
</PANEL>
</STRIP>
Well, if that ain't funny, I don't know what is...
-----------------------
-----------------------
Stay in school, kids! Peace out, Dubya
The only comics that do not heavily use panel layout are the 3-6 panel comics found in newspapers. All of the mainstream comics that are popular on the newstand from Marvel, DC or any of the other publishers require laying out 28-32 pages with ~6 to 10 panels per page.
Panels are not necessarily rectangular, they may not align nicely. ComicML seems to actually reduce the expressiveness of a dead tree medium for the sake of making it techie cool with XML.
an unabashed comics fan,
vic
I mean, Comics I Don't Understand is a useful resource but it assumes that the strip makes at least a particle of sense.
Although I suspect Scott Adams was right -- Zippy has one joke and it's on the reader.
Unsettling MOTD at my ISP.
Oh, you can zip it? Great, let me run out and link the zip libraries into my application. What? There's licensing issues? Well, what do I do know?
ZIP gzip and bzip are all available under very liberal free licenses (no copyleft restriction, OK to use in both closed and open source software).
gzip and bzip2 aren't difficult to use for intermediate (1 to 2 years of experience) C programmers either. I don't know about ZIP because I've never used it, but it's probably not much harder.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
We are developing an application (graphical stimuli presented to a subject, with psychometrics being recorded), and are deciding on how to go about dealing with configuration files, etc.
XML seems like a fairly decent way to store our configuration info (and it will allow an overall configuration to link to other sub-configurations, which is nice for our app.)
But your comment on Guile got me thinking; probably it is overkill. But perhaps the flexibility of a full programming language would be beneficial for configuration, although it may not be meant to solve the same problems as XML.
Your post got me to (re-)look into Guile, but I was wondering if you (or anyone) had any more specific thoughts on what formats to use for configuration files, and what in particular you do with Guile that replaces what you would have done with XML.
"It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
Thanks for the feedback. I need to clear up this statement; I didn't mean that I based ComicsML's first tagset around Western comics ideas to the exclusion of all else, but rather that I created them based on what I knew best, which I decided to label as 'Western' since I'm not nearly as familiar with manga, only enough to know that Eastern comics have developed their own idiomset, and I didn't want to look like I was ignoring it. (Ironically.
It's important to note that ComicsML's panel-description markup detail logically what's going on, not physically. So there's no giant-sweatdrop tag, no more than there's a Western-style sweat-flying-off-the-forehead tag. ComicsML would, instead, have a this-character-is-nervous tag, or something similar. Things like this are visual idioms that are crucial to the comic, but not so appropriate to its descriptive markup.
As for the other issues you raise, about unusual layout and non-verbal balloons, these are both examples of the many challenges and questions ComicsML has ahead of it. It's pretty much open to all suggestions, right now, and I'm glad you bring these up! Now I invite you and other interested parties to bring them up in email to me, or on the ComicsML mailing list (see esp. the ComicsML resource page), instead of on Slashdot, where they'll go away in a couple of days. ;)
Bzzt! So sorry, but you lose! Please play again, McIntosh-san!
Okay, thanks.
J
MacOS Open Source
jmac
First of all, XML documents don't need to conform to any DTD in order to be parsed or be useful. Documents that elect to specify DTDs indicate public URNs so that the DTD can be obtained from the network if it isn't present locally. That's why you distribute the DTD with the program. The public URN of a DTD is essentially for backup, in case a local version can't be found. There is no need to hit a remote server to parse or validate an XML document. No developer in his or her right mind would intend or require this.
I guess this is opposed to "superior" minds who spend their time groking knock-off Unix-isms a decade or two out of date. Are you really making this argument (in public, no less)? XML is a simplified version of SGML, which has been around for years, and is NOT easy to wrap your brain around if you're not a "document head". XML was designed to eliminate the infrequently used complications of SGML and make it suitable for everyday use, without losing the underlying advantages of SGML. Because of this, it is fairly straightforward, but this is exactly its beauty. XML is human readable and robust, both huge advantages, not the least in distributed computing, which is why we're seeing it all over the place now.If it's as "consistent" and "simple" as you indicate, then why is it so hard to parse? This is trolling at its best. The thing that makes XML so productive, and a significant advance to the state of the art, is the fact that you simply link in the pre-built, ready-to-run XML parser of your choice and it does all the parsing work for you. XML parsers exist for every language under the sun. The idea here is that instead of writing your own code for manipulating the low level structure of your data, you use someone else's standard code, and you worry only about the content of the data.
Let me say this again: There is no need to hit a remote server to parse or validate an XML document. You are just plain wrong.
Wrong. Your users will thank you for using XML, because they can actually see the data that's being stored & used by your application because it's human-readable. They will thank you because the format of the data is readily apparent, and can be used by other applications simply by parsing the XML document.What are you smokin', Joe?
The only certainty is entropy.
Unless we're talking only indy artists (I doubt United Features Syndicate would want Peanuts strips easily travelling, and then being searched, on the web).
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
You do realize that's what XML is about, right? By itself, XML is no more useful than plain old SGML (though the syntax is nicer). Without grammars, XML is pointless for sharing data. Sure, you can use XML to do menial little things like handle configuration for an application, but where it really shines is the ability to specify a set of rules for makring up different types of data. Having multiple grammars Just Makes Sense (tm), as a single grammar can't be expected to gracefully handle all the many different applications for XML.
As for server downtime causing parser problems, I see two ways around it -- either distribute your schema so that others can download it and use it locally with their parsers, or have some method of "certifying" schemas, which would then be hosted somewhere stable like w3c.org. As the latter most likely won't happen save for the most visible of schemas, I think the former has the most potential. Sure, there are potential versioning problems, but those can be worked around.
I always thought it would be cool if someone would create a funny-page service. You pay X cents per month or whatever, and they make you a customized web page that simple displays all of the comics you specify for thay day. Then I wouldn't have to load tons of megs worth of pages just to get my ozy and millie and penny arcade fix.
(SpeechBubble)Prepare to face the master...(/SpeechBubble)
(ActionBubble)POW!!!(/ActionBubble)
(ActionBubble>ZAP!!!(/ActionBubble)
(ActionBubble>BANG!!(/ActionBubble)
(SpeechBubble)You are no match for my Kung Fu skills!!!!(/SpeechBubble)
The Parking Lot Is Full archive's search engine (found at http://www.plif.com/archive/search.htm ) allows for search by Character, Character Type, Location, Theme, Elements, and Strip Type. It's pretty amazing.
--
By the Great Spirit, do we really need another XML grammar? Do we really need another obscure specification sitting on another server that will be down 10% of the time and cause parsers to choke, programs to hang, and tech-support desks to light up like Christmas trees.
I'm sorry to go off on such a rant, but I am SO tired of everything being done in an XML format. It's not that it's a particularly great solution, it's just that it's the new hot standard. Furthermore, let's face it, XML is real easy. So easy that very mediocre minds can grasp it and feel like they're "on top" of the current technological trend.
Puh-leeze
As a result we now have a plethora of half-baked, almost-finished grammar specifications littering the internet landscape and plugging up the W3C standards pipelines.
I'm making a predication. Most of these standards will either (1) be forgotten or (2) be rushed through and signed off as standards. I hope and meditate for the first.
XML is great for some types of data, but it's advocates are so blinded by its simplicity and consistency they overlook flaws immediately obvious to more experienced developers. Despite the press, XML is NOT that easy to parse. The same hassles we experience with HTML parsers are magnified tenfold. Furthermore, it often depends on grammar definitions that reside on remote servers. This introduces all the hassles of network-based programming into what should be simple standalone client applications. Finally, it's big. I mean REAL big. Oh, you can zip it? Great, let me run out and link the zip libraries into my application. What? There's licensing issues? Well, what do I do know?
Please, for pete's sake, when you feel the temptation to create another XML grammar, think about what you are doing. Just say no. Your users will thank you.
If the lameness filter actually worked, would you even be reading this?
I doubt United Features Syndicate would want Peanuts strips easily travelling, and then being searched, on the web
United Media may not want that, but the other major comic syndicate (United Express, IIRC) seems to have a good attitude about it...
Both syndicates have always had 'one month' of each strip available - but last year the Uexpress website (www.uexpress.com) made a drastic change..
Last November, they put all of their comics online in a 'back issue' format.. instead of only showing one month of strips, you can go back all the way to 1996 (or whenever their website started carrying the strip - Duplex goes back to August of 96) - Calvin and Hobbes is being carried in its' entirety (more or less, they are revealing one at a time - offset by 11 years of the original strip date, so today's strip is from April 18, 1990; but it starts at November 17, 1985)
Contrast this with BC or Meg, which are so paranoid, they obfusicate the strip filename in a lame attempt to prevent someone from using a robot to download the strip.
You may not be able to get dilbert or Peanuts, but it wouldn't surprise me if Uexpress.com indexed their comics like this.
I can't account for everything a comics artist can pull off, of course, but I did try to cover the major, conventional visual idioms that have developed in Western comics over the last century.
I think this line pretty much speaks for itself, but I will raise a few more points. The internet has allowed comics to pretty firmly break the traditional limitations of print. This DTD seems to want to codify everything inside those old limitations. That's a pretty limiting point of view, I think.
Where are the tags to show art that crosses multiple panels? Where are the tags to show 'visual' thought bubbles. Where is the anime-style giant sweatdrop tag? Where are the tags to show 'emotional' sound effects, as are often displayed in manga and manga-based comics?
Unfortuneately, this DTD pretty firmly ignores everything that doesn't go along with western newspaper-style comics, despite the fact that the author wants to let people break out of those old traditions.
Bzzt! So sorry, but you lose! Please play again, McIntosh-san!
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Will it allow me to magically recreate Famke Jensen by wearing a bra on my head and hooking my computer to one of her action figures, thereby getting me into all kinds of crazy hijinks and turning my brother into a Jaba The Hut type creature?