IBM Unveiling New Transcoder Technology
JavaNPerl writes "This Infoworld article states that IBM is about to beta a transcoder that would translate content based on the client. This may get rid of
some of the headaches in coding HTML and JavaScript for different clients one day and also make more content available for handhelds. " It's like the Holy Grail - keep seeing glimpes of potential systems, but this sounds like it may be the real thing.
I claimed that even the best-written HTML wouldn't be well-presentable on devices of widely varying screen sizes.
This is because, among other problems, people don't like to be shown part of a set of information. HTML implicitly assumes that every web page (HTML file) is a single viewer page, but not even my 21" screen will show all of most web pages (so most pages can't possibly be a single page).
The problem only gets worse when you add small screen sizes, such as the Pilot.
The solution is to be a little smarter about displaying information -- you have to think about how much your user can see at once, and be careful not to show them any half-lines of info or half-table cells. Essentially, you should do a single page of repagination every time the user presses the down arrow or pagedown.
Instead of doing this, modern file display programs either assume that their only job is to display the entire document (HTML, less, MS Word in draft mode) or only display optimally for a single page size which almost no users have on their monitors (Acrobat, Word in page preview, ghostscript).
This IBM server is interesting because it forces a current format to be reasonably presented on multiple device -- or window -- sizes. It doesn't remove the need for a real document reader, but it sure is better than we've had before.
-Billy
What this may do (I couldn't tell from the article) is clean up dirty HTML to make it portable. If so, then yes, it's a fairly clever little piece of software.
It's not just limited to HTML, either, and may take input in a number of different formats (Word, PDF, SGML, XML, more?). The thing they have to ensure is that their transcoding backbone is extensible enough to cover all possiblities. If they fall into the trap of aiming for the lowest common denominator, it'll be doomed. Hopefully, IBM are smarter than that...
"The invisible and the non-existent look very much alike." -- Delos B. McKown
There've been a few posts on this topic already, and they seem to be headed only down one path from this... "html blah blah blah". Well, yeah that is one path this could take; but not the only, and in fact not the most important.
...
/. on one of those things... trust me when I say the experience sucks rotten eggs.
... just stop and consider the sheer volume that is ... the largest computer company in the WORLD needs to *overnight* revamp their entire site. The process took months to pull off, with entire teams of managers herding the cats, er um... designers, around to get it all accomplished on schedule and make the release date. Every page was copied to an internal server and modified with automated tooling, then every update that was made outside, got mirrored inside untill FINALLY, one night all the server masters did a big remount and moved all that content out to the public. You can still see the results of this on many of thier pages by viewing the source and looking for html comments like this:
... but not both ... and all three options are destroying my formating ... and Rob wouldn't give me ampersand-l-t-semicolon or ampersand-g-t-semicolon in HTML markup when I asked a few months ago... hhrmmm... let's see what I can do with plain old ascii...... well, after several trips through the preview button and a lot of reworking this looks better...it still looks like crap, and I've now spent more time on FORMAT than on CONTENT... ##*^_@&^!)&@$%^!#)&^$#^$ HTML
Browsers are *one* target recipient, but by no means the ONLY one... palm pilots for example have very little use for full blown HTML 3.x - let alone CSS and Embeded frames, but this technology can target them*. [ *note: I am assuming that this is the same technology that was demonstrated by the SanFrancisco project at javaOne this past June, where both a palm pilot and a browser recieved the same content tailored to their UI and ran the same application logic. If it is then what it can really do will blow your socks off.... if it isn't then I can't wait to see what Research has up it's sleave. ('cause ya's just KNOW us developers don't get license to play with stuff this cool on corp's time...). ]
Think about this as an *Information Developer* (not an HTML developer, or a "webdesigner")... what do you really want to accomplish? Seperation of content from format? Yes. Targeted formating for a wide variety of presentation systems? Yes! Maintenance of sanity in the process? YES!
OK, so let's play a hypothetical... you need to put up a simple content component (the building block of a larger information presentation). Let's say you want this component to be called a slashdot poll. So you script up the first polls content:
[topic name="best PHB tormentor"
code="phbtorm"
choice1="frabble-do-hickie"
choice2="whakka-loofla"
choice3="nimrod-doodle"
choice4="source code to current project"]
(yeah, totally made up grammer... I'm sure it doesn't look anything like that.)
then you write the first conversion sheet, with a target of text/html
[b]Slashdot Poll[/b]
[FORM action="http://slashdot.org/pollBooth.pl"]
[B]$name[/B][BR]
[INPUT type=hidden name=topic value=$code]
[INPUT type=radio name=aid value=1]$choice1[BR]
[INPUT type=radio name=aid value=2]$choice2[BR]
[INPUT type=radio name=aid value=3]$choice3[BR]
[INPUT type=radio name=aid value=4]$choice4[BR]
[INPUT type=submit value=Vote>[/FORM]
of course you'd really need a lot more (just look at what really is wrapping up the poll) and also be a tad more generic so that you could have a counter that says how many options, and a loop to ittereate and build the form and all that presentation gorp that means nothing to INFORMATION DEVELOPERS. Then you'd turn around and create a second conversion sheet that tells your phonemail system how to present this as a VRS. "Today's slash dot poll is $name. Press or say one for $choice1. Press or say two for $choice2. [...]" (I can already hear the voice of Stephen Halking asking what the past tense of ping is....)
Now once you've written all those conversion sheets, you're done with them. (unless you want to change you display style for a given target) From then on you can update your information in one form and gaurentee that it will be "properly" presented in all your target platforms.
Some of you may start down that tired old line that this is what HTML is for, and that new features like CSS give you this. Well yeah, HTML _was_intended_ for this kind of thing - then the "webdesigners" got their hands on it. At that point you have to resort to half assed hacks like CSS to even attempt to preserve the format independent nature of any SGML.
HTML is OK for the role it has been lead into, but it isn't fullfilling some of the niches people hoped it would because it has been bent to far into another "niche" - the web. Some of the varied devices (in addition to html and VRS mentioned above) that are potential targets include:
(*> our old friend the green screen - no, it is NOT dead!
(*> page readers - enabling tech. for the blind
(*> custom viewers - imagine having the poll as a captive tk/tcl app on your enlightenment docking bar?
(*> translation systems - the I in IBM is never forgotten... imagine the hassles in translating a billion web pages from English to say Hebrew (right to left) or Kanji (top to bottom, and (I think) right to left)
(*> PDAs - for those who think PDAs will consume HTML lay off the frapacinno for a while and get a firm grip on reality. Without trying to much to sound like the Linus sound byte about the Nokia 9000s 'mediocre phone, lousy PDA, miserable web browser' - try reading
But will anyone use it? Well, let's take two case studies of places that COULD have used it... for the first let's look at a major hardware company that lost out on a $250K deal because their web site people had "revamped" their entire corp. web presence to use all the nifty new toys and didn't have the time/resource to update all the old product datasheets... so they dumped them off the servers completely instead: "so they wouldn't clash" with their "consistent face to the consumer".
For a second let's take good ole Hollerith's Analytical Legacy... last spring they decided to change their page design for all external pages
!--Left Navigation here... --
Now that I've written my content I want to go back and reformat it for HTML... but alas I gotta choose either to show html tags (Extrans) or use html tags
A large part of the problem is that authors are focused too much on the visual presentation[1], rather than the semantic meaning of the data being presented.
People forget that denoting something as a list (be it ordered, unordered, or list of definitions) is more important than the list being displayed indented with little swirly bullets next to it.
Remember -- different page renderings are good -- not everyone has the same needs or wants from data presentation.
[1] This is especially silly when there's no guarantee that a page will be rendered visually
pooptruck
I have been playing with XML/XSL for awhile, and it sound like IBM's technology is simular, if not the same. XML seemed to be the hot topic for awhile, but I have not yet seen any serious applications for it. Writing DTDs are hard and implementing them in Java applets is even harder.
Maybe this is what IBM has done... created a replacement framework for these teditious steps.
Yes, Webmethods does look useful. I'd have more confidence in it if their web pages were not moronized.
If you write clean html you won't have that problem.
--- A Jesus Fish eating a Darwin Fish only proves Darwin's point.
this sounds like its competitive to webmethods which is a type of integration/screen scraping crawler that is being used to web-enable sap, etc. essentially you can write a script to extract content from html then turn it into xml then use xsl to transform it to your target dtd. so: you write a script to extract teh data then you go ahead and write xsl stylesheets for your client types.
-- your knees hurt, don't they?
What IBM is really going for here is a way to adapt HTML information into things like WAP (www.wapforum.org) Wireless Markup Language for cell phones & wireless gizmos, and into VoiceXML (www.voicexml.org) for plane-old-telephone access.
Many people are commenting that "clean" or "standards-compliant" HTML is already portable across sundry platforms, and therefore this product is only a crutch for sloppy content providers. This is absolutely not true! Having made many webpages myself, and two or three that actually see a lot of use, I know from experience that standard HTML is one of the least standardized lingos in computing.
The reason is quite simple: people don't upgrade their browsers. Look at www.gnu.org for pete's sake! That page is specifically designed to be Lynx 2.0 compatible because use of "novelty tags" like (included in the HTML 3.2 spec) will break those clients. As a result, the page is fairly ugly.
Choose an involved combination of "standard" tags and it's a fairly safe bet that Netscape 3.0 will display it differently than Opera, which will display it differently than IE4, which will display it differently than Netscape 4.5, etc, etc.
The human is the bottleneck. People don't see a powerful incentive to upgrade their browsers, so they don't. Hence webdesigners like Rob Malda spend weeks of headache time on making their pages BassAckwards 2.7 compliant.
This transcoder, if it works, will really be a boon.
-konstant
-konstant
Yes! We are all individuals! I'm not!
You're on the right track if you're saying that this technology wouldn't even be needed if we could convince people to stick to standards. But you also need one other thing -- well-formedness.
You can stick to HTML 4.0, even the "Strict" dialect (which I encourage everyone to do!), and still have pages that completely blow up when pulled up outside one of the Big Two. On the website that I accidentally deleted some time ago, I had struggled for some time to not just reach for HTML 4.0 compliance but for well-formedness.
It meant using elements for what they were intended for. It meant never using a table for anything other than tabular data. It meant using when I wanted emphasis and it meant using when I wanted to mark up code fragments (mind you, I'm stuck using right now because /.'s HTML filter doesn't permit !) It took some fiddling.
But I turned out with a set of pages that were not only easier to maintain, but CSS applied very cleanly to them, making them pretty and consistent-looking, and they rendered perfectly on any hand-held or speech-synthesizing device you could throw at me. My information was useful to everyone, and that was the best high of all.
I strongly encourage everyone to pursue well-formedness. The more important stuff that is well-formed instead of hacked, the better browsers we'll get, too!
This is a useful tool which, if it works, will mean no more making lame hacks around poorly implemented elements of various browsers to create effects which one's clients often want implemented, against the best advise of their Web developer...
Also, such middleware for multiple deployability will allow those who are delivering fat content with thin design to do so very easily by coding the content in XML and then transponding...
Nothing earth-shattering here, but useful...
o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
At first I thought it was some silly translate HTML that IE understands to one netscape understands (silly) - but it look promising - a proxy that allows HTML to be scaled down for handhelds. ....and you only have to write the site once - the scaled down stuff is done automatically - kick ass ...wish i thought of this.
Imagine in the future, two web ports, 80 and 81, 81 for low bandwidth and handheld devices
Will we be able to tweak the transcoding process? Since _no one_ can know the one-best solution for every enterprise, will it be possible to tweak the output of such a transcoder to your application?? Or possibly modify the IBM-supplied defaults for each browser type??
Will the framework be available as an API (or preferably an open standard), so that in the future programs (RealEncoder, generator, etc..) could be written to automatically include information in their output for specific formats (i.e. text-only, audio-only, mixed).
This could be a huge step forward, especially in the days of PCS-based browsers, but can it be done in such a way as to not lock a company down to a specific vendor??
The "Top 10" Reasons to procrastinate:
10.