Re:All this hype about XML
on
DTD vs. XML Schema
·
· Score: 4, Informative
That's funny, I just looked at the man page for gzip.
Gzip uses the Lempel-Ziv algorithm used in zip and PKZIP.
The amount of compression obtained depends on the size of
the input and the distribution of common substrings. Typ-
ically, text such as source code or English is reduced by
60-70%. Compression is generally much better than that
achieved by LZW (as used in compress), Huffman coding (as
used in pack), or adaptive Huffman coding (compact).
Mind you, XML is highly repeditive in it's tag use on long documents. Long as in multiple records, not necessarily byte length.
Now let's take a larger file, 'cause after all, since modem users can download 5k html really quick. I've taken the soap distribution from apache (or was it sun) and took all the xml files in there and concatonated them together. 22k XML file. Not huge, but big enough for this example.
Here's my findings:
[caligraphy:~] spencerp% ls -al o.xml -rw-r--r-- 1 spencerp staff 22118 Jan 23 21:21 o.xml [caligraphy:~] spencerp% gzip o.xml [caligraphy:~] spencerp% ls -al o.xml.gz -rw-r--r-- 1 spencerp staff 3021 Jan 23 21:21 o.xml.gz [caligraphy:~] spencerp% gzip -l o.xml.gz compressed uncompr. ratio uncompressed_name
3021 22118 86.4% o.xml
Not bad for taking non repeditive text, with random xml schemas and getting 86.4%. Now imagine a larger one with a consistent schema. Compression goes even higher. Granted, it will be slightly larger than a binary. But even a 100meg file can be moved across a 100megabit network in 5 minutes time. And THAT is a lot of data.
Btw, there is a falacy with your math. If I get 50% compression of an XML file, which could have been implemented in binary format, it doesn't mean the binary format would be 49 times smaller.
The worst thing is that, if you have a very large XML doc with deeply nested and complex hierarchies, its a killer on performance.
It sounds like whomever architected your dtd or schema foulded up. Or whomever is generating the data you need is giving youtoo much.
And try exporting large amounts of DB data into XML, and watch your server crash.
So you are blaming a buggy server on XML? What if it was... a large CSV file? Or some other format? Hell, photoshop uses binary internal storage for working with their native format. It even uses scratch disk.
Besides, you are making a large generalization. I've written XSLT over large XML documents that take a while to translate that work quite well. Maybe you have bad ram?
XML is all nice and sweet on paper, but all that it can do is handle relatively small amounts of data. And it looks good when you can tell the suits that you can do stuff in XML.
Then you are using the wrong technology. If you are trying to deal with gig-sized files, you should prolly be using a SAX processor, sinec they take up realtively little ram. DOM is great for when you don't have JAXB (java xml binding) available and you want your data in memory, in an OO/structured memory format.
Anyone remember CORBA ? Or any of the other zillions of RPC-type mechanisms that people have jumped on the bandwagon of ?
Corba is more than just a data format. It's an architecture. XML is not an architecture, it's just a data format.
The price of sitting through a ferw stiflingly boring and pointless standards meetings seems a small price to pay. All large IT companies employ 2 or 3 people whose job it is to front up to these meetings. Typically these people are articulate and highly versed ex-programmers but architecurally challenged and with little understanding of the real nature of building complex IT systems.
You give these people 0 credit. Really. They probably have real jobs doing real things, while for the company's benefit help in creating these standards.
Ultimately, these RPC mechanisms all end up as nothing - or rather, as only perhaps 1% of the eventual solution.
All that XML is, is an easy-to-parse, text based data transfer mechanism. And as the parent posting says, there are some nice tools around for it. Big deal. Probably you'd be silly to use anything else if designing a data transfer. But is it ever going to change the world ? Or even rock it a little ? No.
Disclaimer:Not reviewed for relevance or accuracy - results may vary.
I guess I forgot to mention you can compress XML rather well. It's just plain text. So you get what, 2% growth over a strictly binary format?
I suggest re-reading my post.
And don't make judgements on my efficiency. XML is a technology that can be used for great things. If you don't know how to apply it properly, it's no one else's fault other than your own.
Re:All this hype about XML
on
DTD vs. XML Schema
·
· Score: 5, Insightful
I year ago we had the XML craze we converted all our internal protocols to XML. I discovered that XML was just a lot of hype about nothing. There is nothing self-describing about it. Or maybe there is, just like the section names in an INI file describe the keys in them...
Great thing about XML, is if you need to convert your communications, you can write XSLT against it to convert it while you convert your XML source.. easily. For instance, one vendor I worked with decided that the old protocol didn't work well anymore, and a ne one would be better. Forget the reasons for the change, good or bad.
I plopped an XSLT processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the XML producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.
As for self describing, what is more self describing than HTML? You see a bold and italics tag around an element, you can easily figure out what style the text would be in. Yes, I know about CSS, but the point is, XML IS descriptive, so long as you use good names. Naming elements a, b and c is just developer fault.
If you use XML to develop a lower level protocol you end up with bloated 10k messages.
If in today's age of gigabit ethernet and cheap parts, you really really need to squeeze that extra bit through, compress the line. Seriously. Simplest case, is using ssh. Hell, it auth's AND encrypts. If you are worried about anonymous access, there are other tools.
Well, what about apache developers? They need a place to test stuff, no? Mind you, developers need a place for their milestones as well, regardless if it is production quality or not. When apache gets up to 2.1.xx, apache foundation will start gamma testing to put these features into 2.2.xx.
Re:What exactly are the differences...
on
Apache 2.0.44 Released
·
· Score: 2, Informative
Because...
Production releases are more - fully qa'd - apache is more accountable if something goes wrong - steady documentation
Dev versions are more - unstable, they can have serious errors - experimental, and have features that might be thrown away - not fully documented, so using the greatest might be hard - use at your own risk, it is a sandbox for development, not production quality
Uh.. you better hope you go through QA. Or would you rather be held accountable for production problems because your software doesn't scale, or does something "wrong".
Programmers are NOT the best people to test their own code. Other programmers, maybe, but that is even debateable.
That I am in the same situation as you. I'd argue that less people to run the stations in your area, the less stations you'l have. NYC has over 10 million peple. wtf??
That's WKTU. The US's #1 dance station. This is where all new dance music that comes over from Europe premiers. No kidding. If it's hot in the UK, you'll hear it first. Best rhythm-format station in the country, period.
No you don't. Try digitallyimported.com or 92.3 at around midnite on sundays (or was that saturdays). You'll hear better stuff.
As for K-rock and hot 97, they just play only the new music. You hardly hear metallica, pantera, alice in chains, ozzy, black sabbath, bush, tool or other good bands on k-rock. Last thing I wanna hear in the morning, is another puffy p-diddy rendition of another bad song, more nelly (women and sex sells dude) or one more creed/limpbizkit spin off.
In the past, getting listed was always the problem. There were a thousand different softwares that used their own directories.
Look at the simple IRC network. It's not really that simple, but in terms of who is using what, it's just a simple layout. Who's network you prefer is decided by features and content. IM suffers from this too.
The same goes for the phone software. Who's directory would you want to be on?
Let me represent a small bit of NYC. We have a handful of stations. This is all we have on the FM band off the top of my head.
107.5 - r&b 103.5 - "dance" 101.something - jazz 100.3 - "current pop" music, what kids like 98.1 - new skool r&b 97.1 - old skool r&b 96.3 - classical music 95.5 - adult contemprary 92.3 - "current rock"
There are also about 3 or 4 latin stations. 0 competition. It really sucks. Hopefully, XM will be able to kill off FM completely and switch to a cheaper than cheap brand of "good" music stations. Or at least plentiful ones. Our statiosn don't even compete against each other. *puke*
You can have seperate processes running on seperate processors without the use of threads. All it means, is that there is native code within netbsd itself to support threads.
Threads are the ability for a process to run more instance of itself, sharing all data without starting a new process. With fork, all data is seperate.. two forked processees, for the most part, cannot affect each other.
That's funny, I just looked at the man page for gzip.
Gzip uses the Lempel-Ziv algorithm used in zip and PKZIP.
The amount of compression obtained depends on the size of
the input and the distribution of common substrings. Typ-
ically, text such as source code or English is reduced by
60-70%. Compression is generally much better than that
achieved by LZW (as used in compress), Huffman coding (as
used in pack), or adaptive Huffman coding (compact).
Mind you, XML is highly repeditive in it's tag use on long documents. Long as in multiple records, not necessarily byte length.
Now let's take a larger file, 'cause after all, since modem users can download 5k html really quick. I've taken the soap distribution from apache (or was it sun) and took all the xml files in there and concatonated them together. 22k XML file. Not huge, but big enough for this example.
Here's my findings:
[caligraphy:~] spencerp% ls -al o.xml
-rw-r--r-- 1 spencerp staff 22118 Jan 23 21:21 o.xml
[caligraphy:~] spencerp% gzip o.xml
[caligraphy:~] spencerp% ls -al o.xml.gz
-rw-r--r-- 1 spencerp staff 3021 Jan 23 21:21 o.xml.gz
[caligraphy:~] spencerp% gzip -l o.xml.gz
compressed uncompr. ratio uncompressed_name
3021 22118 86.4% o.xml
Not bad for taking non repeditive text, with random xml schemas and getting 86.4%. Now imagine a larger one with a consistent schema. Compression goes even higher. Granted, it will be slightly larger than a binary. But even a 100meg file can be moved across a 100megabit network in 5 minutes time. And THAT is a lot of data.
Btw, there is a falacy with your math. If I get 50% compression of an XML file, which could have been implemented in binary format, it doesn't mean the binary format would be 49 times smaller.
It sounds like whomever architected your dtd or schema foulded up. Or whomever is generating the data you need is giving youtoo much.
So you are blaming a buggy server on XML? What if it was... a large CSV file? Or some other format? Hell, photoshop uses binary internal storage for working with their native format. It even uses scratch disk.
Besides, you are making a large generalization. I've written XSLT over large XML documents that take a while to translate that work quite well. Maybe you have bad ram?
Then you are using the wrong technology. If you are trying to deal with gig-sized files, you should prolly be using a SAX processor, sinec they take up realtively little ram. DOM is great for when you don't have JAXB (java xml binding) available and you want your data in memory, in an OO/structured memory format.
Argh, i left parent post inside my message. Cut me off at Ultimately....
Corba is more than just a data format. It's an architecture. XML is not an architecture, it's just a data format.
You give these people 0 credit. Really. They probably have real jobs doing real things, while for the company's benefit help in creating these standards.
Ultimately, these RPC mechanisms all end up as nothing - or rather, as only perhaps 1% of the eventual solution.
All that XML is, is an easy-to-parse, text based data transfer mechanism. And as the parent posting says, there are some nice tools around for it. Big deal. Probably you'd be silly to use anything else if designing a data transfer. But is it ever going to change the world ? Or even rock it a little ? No.
Disclaimer:Not reviewed for relevance or accuracy - results may vary.
I guess I forgot to mention you can compress XML rather well. It's just plain text. So you get what, 2% growth over a strictly binary format?
I suggest re-reading my post.
And don't make judgements on my efficiency. XML is a technology that can be used for great things. If you don't know how to apply it properly, it's no one else's fault other than your own.
Great thing about XML, is if you need to convert your communications, you can write XSLT against it to convert it while you convert your XML source.. easily. For instance, one vendor I worked with decided that the old protocol didn't work well anymore, and a ne one would be better. Forget the reasons for the change, good or bad.
I plopped an XSLT processor in front of it. Took minutes to implement. In the mean time, I was able to properly rewrite the XML producing code. So I had some flexibility in terms of patching the protocol quickly, while taking the weeks I needed to fix things right.
As for self describing, what is more self describing than HTML? You see a bold and italics tag around an element, you can easily figure out what style the text would be in. Yes, I know about CSS, but the point is, XML IS descriptive, so long as you use good names. Naming elements a, b and c is just developer fault.
If in today's age of gigabit ethernet and cheap parts, you really really need to squeeze that extra bit through, compress the line. Seriously. Simplest case, is using ssh. Hell, it auth's AND encrypts. If you are worried about anonymous access, there are other tools.
And how is XML bloated? Sounds like an ad post. :P
You need unicode for internationalization, you want namespaces for differentiation of data, you want comments to make.. comments. Troll elsewhere.
Well, what about apache developers? They need a place to test stuff, no? Mind you, developers need a place for their milestones as well, regardless if it is production quality or not. When apache gets up to 2.1.xx, apache foundation will start gamma testing to put these features into 2.2.xx.
Because...
Production releases are more
- fully qa'd
- apache is more accountable if something goes wrong
- steady documentation
Dev versions are more
- unstable, they can have serious errors
- experimental, and have features that might be thrown away
- not fully documented, so using the greatest might be hard
- use at your own risk, it is a sandbox for development, not production quality
Wow.. if someone got my window on a 4th story window next to a 2 lane street where ther are no parks.. I'd be friggin impressed ;)
What's the difference, concept wise, between this and a floppy disc?
Uh.. you better hope you go through QA. Or would you rather be held accountable for production problems because your software doesn't scale, or does something "wrong".
Programmers are NOT the best people to test their own code. Other programmers, maybe, but that is even debateable.
I can't pick up WLIR or WBAI at my house.
Well, if it's not gov't held, we kinda get a mess, no?
AIM, MSN, Y! and ICQ messengers
Java and C#
efnet and undernet
We'll just have a split on who is using what again.
Heh, I live in Brooklyn, near Coney Island. :) Too damned far.
That I am in the same situation as you. I'd argue that less people to run the stations in your area, the less stations you'l have. NYC has over 10 million peple. wtf??
No you don't. Try digitallyimported.com or 92.3 at around midnite on sundays (or was that saturdays). You'll hear better stuff.
As for K-rock and hot 97, they just play only the new music. You hardly hear metallica, pantera, alice in chains, ozzy, black sabbath, bush, tool or other good bands on k-rock. Last thing I wanna hear in the morning, is another puffy p-diddy rendition of another bad song, more nelly (women and sex sells dude) or one more creed/limpbizkit spin off.
101.1 is jazz :)
And who owns it? Who funds it? I doubt the gov't will.. or is it the registries? There's no central authority, like ICANN.. (ICANT?)
-s
In the past, getting listed was always the problem. There were a thousand different softwares that used their own directories.
Look at the simple IRC network. It's not really that simple, but in terms of who is using what, it's just a simple layout. Who's network you prefer is decided by features and content. IM suffers from this too.
The same goes for the phone software. Who's directory would you want to be on?
Let me represent a small bit of NYC. We have a handful of stations. This is all we have on the FM band off the top of my head.
107.5 - r&b
103.5 - "dance"
101.something - jazz
100.3 - "current pop" music, what kids like
98.1 - new skool r&b
97.1 - old skool r&b
96.3 - classical music
95.5 - adult contemprary
92.3 - "current rock"
There are also about 3 or 4 latin stations. 0 competition. It really sucks. Hopefully, XM will be able to kill off FM completely and switch to a cheaper than cheap brand of "good" music stations. Or at least plentiful ones. Our statiosn don't even compete against each other. *puke*
Dude is just cross posting. Maybe he can't be bothered with having a slashdot account. Maybe he does. Who cares?
Really. Or does Karma give men bigger a bigger penis, women, nicer breasts and make your boss like you?
You can have seperate processes running on seperate processors without the use of threads. All it means, is that there is native code within netbsd itself to support threads.
Threads are the ability for a process to run more instance of itself, sharing all data without starting a new process. With fork, all data is seperate.. two forked processees, for the most part, cannot affect each other.
If the submitter is right, you can't play it anywhere. Uh... whoops
You mean a GPS-buster-buster? I'll have to counteract with my GPS-buster-buster-buster. :)
</obligatory The Big Hit reference>