California Joins Open Document Bandwagon
Andy Updegrove writes "A legislator in California has decided that it's time for California to get on the open formats bandwagon. If all of the bills filed in the last few weeks pass, California, Texas, and Minnesota will all require, in near-identical language, that 'all documents, including, but not limited to, text, spreadsheets, and presentations, produced by any state agency shall be created, exchanged, and preserved in an open extensible markup language-based, XML-based file format.' What type of formats will qualify? Again, the language is very uniform (the following is from the California statute): 'When deciding how to implement this section, the department in its evaluation of open, XML-based file formats shall consider all of the following features: (1) Interoperable among diverse internal and external platforms and applications; (2) Fully published and available royalty-free; (3) Implemented by multiple vendors; (4) Controlled by an open industry organization with a well-defined inclusive process for evolution of the standard.'"
Minnesota also is considering open documents.
As long as the format meets criteria 1-4, I don't see why it's necessary to specify that it must be XML-based. Keep it simple, and all that...
Why not just require the format to be in ANY published standard format? "XML" by itself is meaningless, "extensible" is a loaded term (and a very bad idea when trying to write a way to keep things compatible). Why do lawmakers always have to over-specify things until the purpose of the law is lost?
-- 'The' Lord and Master Bitman On High, Master Of All
The dominoes are beginning to fall.
[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
XML is the future. It's the perfect format for any kind of data.
The draw of a markup langauge for documents is that you can print out the raw file and even a lay person can read it just by ignoring the markup tags. Even without knowing anything about xml, I could inspect the file format and write an XML to Text converter in about 1 line of perl.
<user="wwwillem">
<subject>we should do this too</subject>
<content>
What is good for government documents is also good for Slashdot posts.
</content>
</xml>
Browsers shouldn't have a back button!! It's all about going forward...
Anything from .Net to Perl can already parse XML.
Format is irrelevant - since these documents will contain legal-speak, they'll be unreadable anyway. ;)
biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!
N00b: Hey we have this data representation problem, we'll use XML!
Greybeard: Son, now you have two problems.
I want to delete my account but Slashdot doesn't allow it.
If Government intervention is what it takes to force a level playing field, I will accept it. But still I would prefer it if market forces create a level playing field instead of government mandates.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
We should set up a little competition: who can type a first post in xml. Remember, you have to type in all these < and > (which I now had to type as "<" to show the ampersand ... this becomes a recursive nightmare :-).
Browsers shouldn't have a back button!! It's all about going forward...
In other news, Microsoft is quickly subsidizing 3 small companies to write quick and meaningless stupid plug-ins using OOXML as input, just to pretend that their format is "Implemented by multiple vendors" and on "diverse (...) platforms" (ie.: Windows 98, Windows ME, Windows 2000, Windows XP *and* Windows Vista)...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Just called my CA Assembly rep to ask them to support the bill. Look yours up here.
It may not be perfect, but is a move in the right direction.
I guess "plain" text seemed too plain for the legislators. Better to make a law with a hard coded computer language, than a flexible variable.
Saskboy's blog is good. 9 out of 10 dentists agree.
MS made a format that fits the very definition of what they said will be required in this bill. Is this bill just going to lead to government organizations upgrading to the new Office? Technically, all of these things apply even if the implementation of the "standard" will later be forked by MS with their extend and extinguish model. In short, does this really mean truly open formats will get a boost? Or that MS's new format will seem like the solution to a problem they have practically invented?
Judges and senates have been bought for gold; Esteem and love were never to be sold.
I think the only document format that would qualify is ODF (by OASIS). It's the only well known document format, based on XML and extensible, open and implemented by different vendors and office suites.
Custom electronics and digital signage for your business: www.evcircuits.com
"Why do lawmakers always have to over-specify things until the purpose of the law is lost?"
...of course that is the federal process, and the states vary in their organization, but it is mostly the same. It all depends on how the states have drawn-up the rules for their specific legislature.
This is over-simplified, but here goes... American laws are made in sub-committees of committees of the legislative body. The committees are packed with 'specialized' delegates, i.e. someone with a political stake or in the pocket of a special interest group, (like Microsoft, OSDL, or Green Peace).
Keeping that in mind, every law has to 'pass' through the upper committee after the sub-committee, before passing in the full-legislative body. The extra wordiness is to satisfy the other 'specialized' delegates' demands.
To put it simply; They HAVE to make it ridiculously wordy or it will never become a law. There is just too much money involved. This means that all Microsoft, or anyone else, has to do is 'buy' an influential delegate in the sub-committee, or the chair of the committee in order to kill this bill before it is even voted on in the full legislature.
"Our Constitution was made only for a moral and religious people. It is wholly inadequate to govern any other" -John Ada
...must be free-range, smoke-free, and grown under organic conditions in a carbon-neutral environment and driven to their respective lead-free file folders in ethanol-fueled hybrid vehicles.
Your company needs a blog, but (and this is critical) it won't work if it's part of your corporate strategy of appearing-to-look-really-hip. It works if one of your employees creates it on her own initiative, and the strategists leaves her alone.
My turnips listen for the soft cry of your love
(/me ducks and runzlakhell...)
(though /me wonders... why the hell not ps? Guess it doesn't have all those neat little bracketed thingies in it that say "tech!" to the average politician)
Quo usque tandem abutere, Nimbus, patientia nostra?
Just specifying XML doesn't mean much, really:
... more binary crap...
<document>
Description of MS Open Format
<![CDATA[
37642364 78346478 23465789 34657834 65783465 78934653 47895634 78563478 65347856
56347825 63478256 34786578 34567893 45678934 65783456 78465783 46578346 57834567
34895723 48957348 90578934 75890347 58934758 93475892
]]>
</document>
- For the complete works of Shakespeare: cat
As in, do these laws also include stuff like CAD drawings, which currently get stored in Autodesk's proprietary format? That would certainly make me extremely happy, since AutoCAD's monopoly on the CAD industry is as bad or worse than Microsoft's monopoly on office applications.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Some documents need to be further processed.
For a computer, flat plain vanilla text is meaning less, it's just a long stream of ASCII (or UTF-8 if you need accented letters or more alphabets) letters.
For a given flat text, you can't easily extract titles and build a table of content for example, because the titles aren't specifically taged as such.
Therefore you need some tagged kind of format to be able to further process the documents. You can't do it with plain Text (nor ready to print formats like PS or PDF).
But, on the other hand, you don't need to restrict to XML only. Whether the format should use XML, HTML+CSS, CSV (for tabular data), SVG (for graphical data), LaTeX, RTF, YAML, Binary ML, C-like structures (like POVRay and similar), specially designed for format, or whatever else.
XML may have some advantages (widely available parsing libraries, technologies like XSLT for easy translations between standarts, etc...) but that doesn't mean XML should be enforced. Any markup format should do the job, as long as it's well documented, implemented on several architectures/platforms/softwares *including FLOSS* and patent free (or with patents that specifically allow FLOSS implementations).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Quo usque tandem abutere, Nimbus, patientia nostra?
SB 446
So far, each bill has been filed and referred to the appropriate committee. However, the legislative session just started in January and things don't usually start happening until after the filing deadline on 2007-03-09.
There is one particular spreadsheet we have to email for a report - it is loaded with Active-X controls and the like, it Works with Excel on Windows and nothing else. One of the insances where we have to pull out 'the Windows laptop' to do the report once a month.
But since that is a reporting metod with contractors (not the general public), I bet it would be an exception.
Along with that there are other Windows specific gotchas - one is an the Access DB that another program has required us to use, and the third instance, a reporting site that is loaded with IE specific ActiveX code (even if you spoofed as IE, it doesn't work).
Every other state report we do sanely accepts either a, delimited text uploads, plain old paper reports, or a 'most browser friendly' web form.
...if it doesn't solve the problem, use more! :P
Staring at a white background [on a computer screen] while you read is like staring at a light bulb — Maddox
Provided the format is well-known and well documented, any language can be used to write a parser for any file format. Now, the nice thing about XML in particular is that most modern languages have either built-in parsers or pre-written libraries of parsers available.
My blog
I am, of course, talking about Microsoft. They refuse to accept the Open standard.
Until that happens, there will be problems. Yes, you could have .odt documents sent internally, but what if someone has to send a document to someone outside the company? Microsoft Office does not recognize .odt, and if you think that you can train someone to remember to send .doc files to outside users, and keep internal documents to .odt, then I have a bridge to sell you.
Let's stop dilly-dallying and just change "-1: Overrated" to "-1: Disagree" or "-1: Doesn't Subscribe to Groupthink".
XML means it is readable by humans. You don't even NEED any kind of a program to get the text.
Where's my "score -1, not well formed" moderation option?
What next? Hasta la vista, Vista?
If California passes this resolution I can see two outcomes. 1 The state recognizes that ODF has to be used and scraps Office and loads OpenOffice or StarOffice. Big win for the citizens of California big loss for Microsoft. 2 The state recognizes that ODF has to be used and because older versions of Office won't work with ODF they purchase Vista and Office 2007 for all state agencies. Huge loss for the citizens of California huge win for Microsoft. Guess which is more likely?
3 states who's yearly budget is under review are looking for ways to drive down existing IT costs by threatening to pass legislation that will get them huge discounts on Operating System and Office Software .........
20th century Marxism is not progress...
The absolute unfortunate truth in this case is that it will not matter what requirements the state of California sets forth, because in the end it serves not the people but the income of the government.
You see, in California, we have this precedence of hiring under-motivate, under-educated, people into roles to fulfill status-quo on the premise of serving equality.
This results in a rule that I call "Factor 4" where by you can take the initial cost of any related project, service or resource requisition, and thereby multiply it by factor of 4 in order to obtain the actual cost to the government.
Sadly, Factor 4, is a direct result of the mediocrity that has taken up residence within all of our government agencies. I cannot imagine a bigger nightmare than this one that I just read about. Half of the institutions within the government are filled with people that have no idea what that means, and lack the education to understand it.
With this being true, we open the door to committees, educational round-tables to determine educational requirements, requisitions for training, then post-comittees to evaluate if the needs were met, then another comittee to determine if the proper mixture of minority members were upheld, then further we'll add layers of evaluation to insure that all submissions qualified with the sole purpose of perpetuating a verification process that checks itself sometimes 3 times over- with absolutely no guarantee that said process is: accurate, predictable, or effective.
All this does is allow state governments the ability to ask for additional funding, which they will earmark with non-related items, and then fund other programs with the initial request.
Translation- the greater good for which said items are presented will be moderately served.
Outcome: Ho hum and whatever. Can't we think of better things to do with my tax money than fuddling around with this area of business? I say they throw out this status quo requirement and start paying people what they are worth so that we can get some really bright minds into our state governments.
XML Parsing Error: not well-formed
Location: http://slashdot.org/
Line Number 2, Column 6:
<user="wwwillem">
-----^
<user="Odiumjunkie"
<title="Re:we should do this too"</xml>
<quote id="wwwillem">
<pre><user="wwwillem"></pre>
</quote>
<p><span class="pedantic">Forget to close that tag much?</span></p>
</xml>
inevitably, I fucked that up
This should bring in some big bucks for certain projects at least...
SIG: TAKE OFF EVERY 'CAPTAIN'!!
I don't get what all the hoo-haw is and why we need courts or lobbying for any of this. I find it very difficult to write anything when my term paper or [insert your document here] isn't open. Sounds like a bunch of people just need to learn how to double-click.
I'm sure that by now you agree that doing a First Post this way, would be a tough challenge.... :)
Browsers shouldn't have a back button!! It's all about going forward...
Governor Swartzenager said in a press release, "The state of California has Terminated vendor lock in. It's the End of Days for the Raw Deal and True Lies we were getting from Microsoft. Documents can now be backed up for Total Recall."
Support Right To Repair Legislation.
(1) Interoperable among diverse internal and external platforms and applications;
"Diverse internal and external..." I think diversity would include Linux distros...MS products don't run natively on Linux-based OS's - Partial failure
(2) Fully published and available royalty-free;
I assume "Fully..." means no secret binaries, or API's..."available royalty-free" define what is a royalty... as in MS can't "choose" to whom to license it...and can it be passed on? - Partial Failure
(3) Implemented by multiple vendors;
I assume "implemented" means "used as a native data format to the application, not something that requires a "filter" to open or save it... Failure
(4) Controlled by an open industry organization with a well-defined inclusive process for evolution of the standard.
'nuff said...Complete Failure.
Of course much of that is due to the fact that TeX has been around for so long without any significant changes and, given enough time, XML formats will likely settle toward the same level of quality from different implementations. Still, TeX's consistency is impressive.
It's like that by design. IIRC, Knuth is very concerned with the stability of TeX, in terms of producing predictable output from a given input file. I've read that the plan is to completely freeze the codebase when he dies -- I think he described it as a point when "all remaining bugs will become features" -- and although others will be able to be free to take the code and produce some other typesetting engine from it, "TeX" itself will be set in stone, so you'll always be able to take a TeX document and get the same output from it. This is represented by current version numbers that asymptotically approach pi (e.g. version 3.14, 3.141, 3.1415...) with each bugfix, where the final version will be marked by changing the version "number" to \pi itself. I think METAFONT approaches e in the same way.
I've always thought that this represented a pretty forward-thinking view. Not too many people really think too hard about what will become of their software after they die. But what do you expect from a guy who thinks that this is a stop-the-presses, call-your-sysadmin "dramatic improvement"? Now that's attention to detail. (Or, how about his taxonomy of diamond-shaped road signs?)
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Dell will preinstall linux on systems for a large enough order; state governments negotiate discounts with drug companies; is it that difficult to think that a small company will make the effort to load openoffice or some other odt-compatible app for the opportunity to transact business (not a fundamental right of freedom, mind you) with one of the largest customers they have the opportunity to get?
What is needed is an ODF compatible version of WordStar.
The idea that at an enterprise level there are multiple vendors (real vendors, not distributors) of a word processor and spreadsheet program is a joke. There are perhaps three, and the two I know of are OpenOffice/StarOffice and Microsoft. And there are huge questions about the enterprise viability of OpenOffice that have yet to be answered.
Also, the level of complexity for ODF is such that it is unlikely that every implementation is going to render it the same. This means you create a document with one application and all the form fields are lined up. It is then printed with a different application - still using ODF - and the form fields are shifted over slightly. Maybe just enough to move from column D to column E on the form.
The level of complexity is utterly absurd for any cross-application compatibility. Micrsoft at least understands the problem and clearly indicates that such compatibility isn't going to happen. Reading their standard shows that. Without a committee overseeing development and implementation and certifying implementations, there will never be the level of compatibility that is required.
Besides, who gots mo money than they know what to do wit? The Gub Mit. (-- In Living Color). Let them pay to insert the thin end of the wedge into the MS monopoly.
What are they? All the "enterprise-level" products I have heard of are Office and OpenOffice. WordPerfect or PerfectOffice has been out of that game so long as to not even be a real consideration.
Are you including things like KOffice?
Where is the compatibility certification that says the documents are rendered identically? You know that is going to come up, sooner or later. Because it is going to be important at a government level. You have preprinted forms that are filled in on the printer. Without identical rendering the spacing may be off just enough to matter.
I vote -1 troll
If you are doing large datasets, you can end up trippling your filesize easily.
...
. .,90
Think about this, if you have 90 datapoints recorded 100 times a second, that's 9K+ sets of tags that have to be decoded every second with at least 7 extra characters per tag set. That doesn't include the performance hit for encoding & decoding the datastructure.
<data>
<time stamp=[timestamp]>
<sensor number=1>5.00</sensor>
<sensor number=2>5.01</sensor>
<sensor number=90>5.00</sensor>
</time>
<time stamp=[timestamp+1]>
...
</data>
For large, structured datadumps like that, CSV can actually be a faster way to work. A hybrid works even better to make a more flexible system with some of the best features of both.
<format>
<interval>
<unit>Second</unit>
<value>0.01</value>
</interval>
<field name='Sensor1' position=1/>
...
<field name='Sensor90' position=90/>
</format>
<CSV_Data>
1,2,3,4,5,.
1,2,3,5,4,...,90
...
</CSV_Data>
It uses XML to define the data format, but at the same time, it doesn't require a lot of processing to decode the actual dataset. Also the file size is minimally impacted.
Lobbiests would be people who are most like lobbies, I take it?
Media that can be recorded and distributed can be recorded and distributed.
-kfg
Oops!
Although the starting situation is mortly similar with OOXML, it isn't quite exactly the same.
Yes, there may be some different way to interpret the standards, and maybe two different implementation produce slighlty different results (Spreadsheet formules, for exemple aren't standarized yet). The difference is that this standard is controlled by a whole comitee (OASIS), in which several software maker are represented, include FLOSS, and it's in their interest to have the best interoperability as possible.
Thus there's a high probability that, faced with such a situation, the detail of the implementation will be specified in next OpenDocument revision, so that the other software can do a better job in opening their interoperation. In fact, latest versions of AbiWord seem to be much more close to the original OpenOffice.org document. Or maybe they'll even create a new options that allows to tweak the parameters of the function in a documented way ( where 2.67 imitates best the behaviour of OOo and 3.14 is KOffice's default and 3.00 is what every new application is supposed to assume in case of missing param, according to documentation)
And if some other product develops more functionality than ODF is capable of encoding, there's a high probability than an extension will be written and published with the next ODF revision.
In fact, ODF isn't as much direct memory dump of OpenOffice.org as SXW was. ODF has been further processed by OASIS. Whereas OOXML (for now) is still a direct memory dump.
s the sole and unique maintainer of the OOXML specification, it's not in their interest to maintain pixel-perfect conversion for competitor (they need competitor to be bale to interoperate with document formats - to shut the people complaining up - but they need to be the only product that can promise 100% pixel-perfect imports).
Regarding with difference of working, the whole documentation is bloated with definition of options like "" for several thousand pages, which aren't explicitly documented at all (it's only written that they will be deprecated and that implementation aren't required to react to them). Only MS-Office will ever be able to open them by definition (Abiword may be able to reverse engeneer them, but it'll take more work than asking OASIS for a better documentation).
You can bet that, if Microsoft adds some new functionality, they'll be the only one to support them as a paid-for extension... probably called 'Visual OOXML#'. You can be sure that they'll out-"Embrace, Extend, Extinguish" their own ECMA approved standard.
Or at least to produce as badly written as possible documentation.
What will make the difference between OOXML and ODF is microsoft willingness to cooperate (or lack of), and OASIS comitee collective need to collaborate between members.
The only potential way to save OOXML is to put it into control of a groups, in which there's at least one FLOSS represented (say, WordView), which will have to grant full right to use and promise not to patent-sue independent implementations, and will force Microsoft to use OOXML instead of some "MS OOXML.net" extensions.
Which they'll never agree to.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
<div class="commentTop">
<div class="title">
<h4>Re:<xml>we should do this too</xml></h4>
</div>
<div class="details">
by iabervon (1971)
</div>
</div>
<div class="commentBody">
<div id="comment_body_n">Slashdot posts are already xml, you know. Using a targetted schema like XMPP would actually be simpler than the status quo.</div>
</div>
All video formats are now illegal. (or is there now an XML video abomination?)
All audio formats are now illegal.
Probably all image formats are now illegal.
Whee.... this'll be entertaining.
No, you simply type it in a web page editor like Dreamweaver or NVU, then copy the html code from the html tab into the content tab, then copy the newly-created html code into the text box.
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
A lot of people miss this point. Thanks.
No other site I know of has such illiterate and apathetic editors. Come on guys, this is just fucking embarrassing. Did you even go to school? Do you even care that you look like fucking morons and drag down the credibility of the site? Fucking asswipes, do your job or you should get a boot helping your ass out the door.