Australian Stats Agency Goes Open Source
jimboh2k writes "The Australian Bureau of Statistics will use the 2011 Census of Population and Housing as a dry run for XML-based open source standards DDI and SDMX in a bid to make for easier machine-to-machine data, allowing users to better search for and access census datasets. The census will become the first time the open standards are used by an Australian Federal Government agency."
I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.
Some drink at the fountain of knowledge. Others just gargle.
If you want to see some example DDI xml files, check out http://www.colectica.com/ddi. They have documented several public datasets. It is neat how they can document the survey as well as the data for the 2010 US Census.
"The census will become the first time the open standards are used by an Australian Federal Government agency."
Really?
http://xena.sourceforge.net/
Australia is openly embracing census data and enhancing it's availability.
Canada's government is going out of its way to prevent census data collection.
134340: I am not a number. I am a free planet!
and open source you share your code freely to help everyone....WOW isn't this a oxymoron gov't
The best part of having statistics is the ability to find correlations between different sets of data. For instance, do people living in suburbs with greater access to parklands live healthier lives? What is the household income for people who are over the age of 65, own 3 or more properties and use public transport on a regular basis?
I am guessing that the data they plan to release will be anonymised to a level that makes finding correlations very hard or impossible to accomplish? You can obtain some high level correlations by looking at data on a suburb by suburb basis however for much of the data they collect, suburb of residence isn't an important factor.
Meanwhile, in other government agencies and private enterprise there are open file formats such as the geophyical SEGD and SEGY formats that have been used since at least the 1980s. That means you can read data files from 1982 on current software.
Closed file formats are an "innovation" of Microsoft and similar companies. It's really any different from the bastards that write unreadable code in an attempt to provide job security.
hopefully in the future some of the practices of elements of Microsoft and many others will be remembered like the claim salters and others with "sharp" business practices in the old west.
We should find out what percentage of the population thinks that this is a good idea....
...and here's why:
It's official - Munich Linux migration is "dead - abandoned in all but name." - Linux
Yes, you read right: "Dead - abandoned in all but name".
Munich Linux migration is "dead - abandoned in all but name."
Last I heard it was a migration to open source and they were successfully using open source desktop applications. The operating system may be Windows rather than Linux but this still seems to be a victory for open source. On the desktop the applications are far more important than the operating system.
There is some difference. I'm not clear from the summary exactly what's going on.
XML is perfectly suitable for long term data storage and exchange. You have namespaces, schemas, and a millions of tools to handle it.
YAML is OK for storing configuration data. It's not even that good for anything else.
Also anyone who "parses in ad hoc ways" deserves to be slapped in the face.
How many Jedi's currently live in Australia.
If you mod me down the terrorists will have won
Then I saw the huge huge light of why white space indenting is so great. I could explain but I'm not sure I could have convinced even myself before trying it.
I tried. And the whitespace syntax still itches me. When the idea was original (I think back in the eighties, anyone remembers Occam?), I thought "cool". Once Python arrived, I thought "meh". Since then, I've written a couple of K lines in Python, and still, I think it's nice to have *two* channels: block structure for the compiler (i.e. (..), {..}, begin..end, whatever) *and* indenting for the humans.
I do appreciate the extra flexibility this gives me.
For me, THIS IS AN ISSUE.
Still, it's a comparatively small issue. There are many things in Python which itch me far more than this.
Not that I think XML is a good idea. I think it's broken beyond repair.
YMMV, as always.
As the author of the Perl module YAML::Tiny, and the current maintainer of the original YAML.pm I call troll on the parent.
YAML as a specification is way more complex than XML and it's way harder to implement.
And who in their right mind is going to read the raw census statistical quads directly? The point is moot.
XML is ideal for machine to machine communication. It's easily machine readable, and easily debuggable by nerds (which is the bit of "readable" that really matters here). And machine readable is what the ABS has in mind here as their goal.
Adam K (too lazy to log in)
The census will become the first time the open standards are used by an Australian Federal Government agency.
What the hell are you talking about? We use a variety open standards every day of every minute across every department with any modern IT assets, I think what you meant to say was the first time that open standards are being used by an Australian Federal Government agency to communicate with the general public. Even then, it's not exactly news, it was going to happen eventually.
Many Australian federal agencies that I know still use closed-source products. All databases run Oracle, accounting: SAP. all desktops: Windows XP and they use that XP to manage hundreds of Redhat machines (via Putty). Every product they install has to be be "Enterprise" edition or "corporate" If it's not a proprietary product coming from a large company they are not going to deploy it. That's why companies like: Cisco, Juniper, HP, IBM, Oracle/ Sun, Redhat, SAP are big in Canberra (home of Australian federal government). It's basically a license to print money.