Part of it is just the usual: too verbose, the data gets lost in the formatting, as a human-readable format it fails miserably but human readability shaped its form considerably. I also find that not everything fits nicely into a tree structure, like XML wants. (This is partially shaped by my latest project, which requires more general graphs.) Now there's some things like XPointer and the like, which are the beginnings of pretty lame attempts to pretend that XML doesn't force everything into a tree.
The difference between attributes and elements isn't clear-cut, and providing both in such an arbitrary manner promotes confusion. Several people suggest good ways on when to use which, but it's silly to have a file format that apparently needs such recommendations to be repeated so many times. "Attributes are for metadata", the claim goes, "and elements are for data". I'll briefly ignore (as most such recommendations do) the fact that metadata may need to be structured as much as, or more than, data. My main point to this paragraph is that the distinction between data and metadata becomes blurred, particularly when you have different processors working on the same document. And isn't that what XML is for? If everything were clear-cut between attributes and metadata, we wouldn't have things like processing instructions (meta-meta-data?) and the like. Certainly, the xmlns attribute is on a different metalevel than some program's name attribute; why not draw a new metalevel distinction for that? Trying to draw a line between data and metadata like XML does a futile exercise. Everybody's metadata is somebody else's data, so just use namespaces or something instead of trying to make a distinction inherent to the file format.
Most people these days use object-oriented programming, and it's nice to be able to read in a file and have it generate an object tree of appropriate classes. But surprise! it doesn't work that way. The best time to do type assignment is during validation, and there's three main validation systems: DTDs, XML Schema, and Relax NG. DTDs are woefully inadequate in more ways than I can describe. Relax NG, while my favorite validation structure, doesn't assign type information to the data as it's validating because of the holy "Thou shalt not augment the infoset" mandate of Relax NG. Even though it has several constructs that just scream OOP subclassing, those constructs aren't visible to the user, just the parser. XML Schema puts type information into the infoset, so XPath etc can get at it. Other than that, once you get beyond the trivial, it sucks in just about every other way: it's not orthogonal, its extensibility and reuse constructs are as pathetic as anything I've ever seen, and it reeks of something that was built by passing a slapped-together application-specific spec into a committee after it's been pushed in several directions without any thought of graceful growth (which, of course, it was).
This is just what came to me quickly (although I've been wrestling with these issues for a while); I haven't stopped typing to think about what I'm typing. So I haven't carefully spelled out my issues, but that's okay; other people have issues too that you can read about. There's lots of places on the web to talk about the suckage of XML's file format. My personal favorites usually come from discussions among Lispers, because any Lisp programmer sees XML as sexps in drag.
Now that I've ranted about XML for a while, don't miss the point of my post (the GPP). XML brings some fine things to the table; I've spent the last year working heavily with XML by my own choice. But it's not a great file format. Many people see just the file format, and think that XML sucks. My point was to bring other aspects of XML to light, aspects that make XML a good thing overall.
I really like XSLT for code generators, with the meta-data in XML. I do, however, miss the sheer perversity of using Access VBA to generate Java.
For the project I'm working on, I considered doing just that. In the end, I decided it would be easier to use XSLT to transform the XML into domain-specific Lisp sexps, and then use Lisp to transform the data into the code format I need. But it certainly is fun!
I got quite a few responses here, but yours gave me a real chuckle; thanks!
You're stupid.
Ah, personal attacks, the last refuge of an argument without decent support. Or is it the first? It's so hard to remember.
That XML is extensible and self-describing gives it a huge advantage over cobbling together your own file format.
You're comparing two orthogonal attributes. One attribute is the power of the language, that is, that it's extensible and self-describing. The other attribute is that it's a standard.
You can have languages that sit in any quadrant of the plane you just described: powerful standard languages (sexps), powerful non-standard languages (how most good programming languages start out), weak standard languages (HTML 1.0), weak non-standard languages (a deluge of config file formats come to mind). The fact that XML is extensible and self-describing has nothing to do with the fact that it's standard, rather than cobbled together by myself (or some other random hacker). It is possible for somebody to cobble together a language that is extensible and self-describing, and it is possible for a committee to standardize a format that is neither. So claims that a language's position on one axis gives it an advantage over languages that are positioned in a particular manner on the orthogonal axis are specious.
Re:n00b - help!
on
Effective XML
·
· Score: 5, Insightful
The coolness of XML is not in the format (which sucks); it's in the technologies around it.
RelaxNG, for instance, lets you verify that your XML file is built correctly for your app: you write a RelaxNG spec for your XML file format, and then it verifies that all the mandatory fields are there, in whatever order is necessary, with the correct datatypes, etc, etc. RelaxNG processors are part of most major XML libraries now, so if you're writing Perl you can just tell your Perl library to validate your file and it's done. If you're editing in Emacs (with nxml-mode), you can point Emacs at your RelaxNG file, and have tab completion, error highlighting, etc, etc-- all customized for your file format.
XSLT lets you take an XML file and perform transformations on it into another (possibly XML) file format. Need to convert XML into SQL INSERTS? Piece of cake. I use it to extract particular parts of an XML file and convert them into a significantly differently-ordered Lisp structure.
Most modern web browsers are becoming CSS engines rather than HTML engines. So you can stick a CSS stylesheet reference at the top of your XML file, and have the CSS generate something that looks like what you want the user to see. The data file looks good to the app, and looks good to the user. You can also (with some browsers) use more powerful transformations using something like DSSSL or XSLT.
DOM for a standard data manipulation API, so each program you write doesn't have a different data access language. XPath as a language to perform more complex queries. XML Namespaces to let users or apps tag their data with extensions. XInclude for data sharing. All of these are things you get for free with XML.
All of these are general technologies, not specific apps. So they should be usable in most major libraries in most languages. (If you're using Perl, I'd recommend XML::LibXML.)
Don't think of XML as just a file format, because that part sucks. Think of it as a buffet table of technologies. When you write a program, 10% is to do the program's processing; the other 90% is to handle I/O, data management, and other housekeeping. Using XML lets you get a lot of that for free.
PS: I'm not an XML fanatic. A year ago, I was told to use XML for one particular project and was disgusted at the idea. I still think that XML gets a lot wrong, but I've come to recognize what benefits XML provides.
Not a big fan of several of the answers. The first batch of questions has some things that are quite wrong (virtual memory is implemented by time-sharing?!?) and there's no comments correcting them. At least the one that starts with "ls -ltra" has comments, although the main page doesn't make that clear.
One problem with providing answers to interview questions is that it's almost useless. If the interviewer knows the correct answers, then they don't need it. If the interviewer doesn't, then the questions must be crafted to only have one correct answer (eg, "What does UDP stand for?"), and such questions are often teh suck. Otherwise (still discussing the case when the interviewer doesn't know the problem domain) you end up with situations like the windows/unix filesharing question, in which the interviewer expects to hear NFS while many respondants would reply SAMBA. Open-ended questions, such as "what does [technology] do" are the worst in this scenario. So I don't think that providing answers helps.
Beware also of "opinion" questions, such as "what is the main advantage of symlinks over copies". The question on your site says that permissions are shared, while I think that the main advantage is that modifications are shared. Somebody coming from an embedded systems background may well have good reason to say that the main advantage is disk space.
I guess my point is, it's perilous to interview for Linux folk if you don't know enough Linux to deal with a variety of correct answers.
This suspension of the Bill of Rights at the sole discretion of the Administration is literally an unprecedented extension of authoritarian power to the President.
Point one: The request for ID was never mandatory; the airlines had been fighting for it to be mandatory for some time, since they didn't want tickets to be transferrable.
Point two: The request for ID by itself is not as serious, in many people's minds, as the fact that we are bound by regulations that we are not allowed to know.
"What?" Yossarian froze in his tracks with fear and alarm and felt his whole body begin to tingle. "What did you say?"
"Catch-22," the old woman repeated, rocking her head up and down. "Catch-22. Catch-22 says they have a right to do anything we can't stop them from doing."
"What the hell are you talking about?" Capt. Yossarian shouted at her in bewildered, furious protest.
"Didn't they show it to you?" Yossarian demanded, stamping about in anger and distress. "Didn't you even make them read it?"
They don't have to show us Catch-22," the old woman answered. "The law says they don't have to."
Autodial gets you a good portion of the way there. A good autodialer should take only a couple of seconds, in other words, not much longer than you need to focus on the screen anyway.
A cronjob can fetch your email periodically, so you can glance at your screen and see that you have mail. And you don't care if there's a few seconds delay on your outbound mail; let your MTA deal with that.
As for webbrowsing... hmmmm, that's a bit tougher.... Okay, here's one. Put in a proxy. If the net connection is up, then it just works transparently. (And by the way, Squid really does seem to speed up my web fetch times, even from the same computer!) If the connection is down, it brings it up, sure, but what to do in the meantime? Well, if you're visiting/., then it says "Nothing to see here, move along". If you're not, then it redirects to the same URL with a typo (so you'll assume you screwed up), and then displays a parking page. Okay, that sounds pretty authentic.
IM? Piece of cake: grab an IRC server and a bunch of Eliza-bots.
Okay, you're all set! Always-on experience, on a dialup budget!
Hardly. Anybody who's watched movies since the 80s knows that when a server is overloaded, sparks shoot out loudly and the server emits a high-pitched whine just before exploding.
He did. According to the 1912 Webster's (easily available online), "period" can be a transitive or intransitive verb.
Since the OP was an AC, the GPP decided to dub him "Period" as a moniker to remind the GPP of his crime, and his first and third usages were as a noun of direct address to emphasize this (as in, "Sir, yes, sir!"); the part after it was a modifier of the final "period". The middle usage of "period" was as an intransitive verb.
MD5 from Fourmilab can do both of those jobs. It's fine (AFAIK) for checking downloaded binaries. For cryptographic purposes (like what I proposed), you'll want some crypto knowledge for anything serious, but it should be fine for "toy" usages like claiming/. posts.
Another good idea: run a string along the conduit. That way, when you have to pull something later, you can pull it on the string (along with a new string). Easier than using fishtape, and (in my not-so-experienced opinion) less concern about cracking fiber.
When you're choosing the conduit's thickness, don't forget that you're likely to have some runs with some thick bits of cable; for example, your home entertainment center may eventually have RG6 (for the TV cable), cat5 and/or fiber (for the home entertainment PC and/or TiVo), four pairs of speaker wire (to the 7.1 system's surround speakers), a stereo pair of audio signal wires (to the house music distribution panel), plus some stuff I haven't considered. You'll need some more room in the bends to make sure that there's plenty of space and cables don't get kinked; cable kinking can do icky things to signals even when it doesn't affect DC.
I'm no architect, so I don't know how much your choice of building materials here is going to affect fire risk. Talk to a pro to make sure that the conduit doesn't make your home into a firetrap (by channeling fire to all the house walls quickly). You may need to use plenum cables at some points. But again, I'm not a pro.
When making a reference comment as an AC, you may want to include an MD5 sum of a phrase of your choosing. That way, when you refer back to it, you can demonstrate that it was really you.
The eastern border, in the Mojave Desert; I was driving from Texas. I think it was on I-40-- I have a vague memory of an oil change that I think was in Needles-- but it may have been I-10 or somewhere else in that region.
Last time I drove over the state lines was when I moved to California. But at that time, they had roadblocks set up to ask everybody if they were carrying any fruits or veggies.
So possibly those same roadblocks could sign off a milage log when you enter or leave the state. Purely voluntary, but it's an easy way for you to prove that you were driving X miles outside of the state.
A DVD decoder in Windows isn't a stand-alone application, but an addition to the DirectShow architecture, which still is the most powerful and easy to use multimedia rendering solution available on the desktop.
How do you figure?
I'm not looking to start a fight here, but why do you feel that DirectShow is more powerful and easier to use than Quicktime?
I have in my InfoCom game packaging an original sealed sachet which contains a "Microscopic Space Fleet".
Ooh, you still have the fleet? A dog ate mine.
What's your beef with the file format?
Part of it is just the usual: too verbose, the data gets lost in the formatting, as a human-readable format it fails miserably but human readability shaped its form considerably. I also find that not everything fits nicely into a tree structure, like XML wants. (This is partially shaped by my latest project, which requires more general graphs.) Now there's some things like XPointer and the like, which are the beginnings of pretty lame attempts to pretend that XML doesn't force everything into a tree.
The difference between attributes and elements isn't clear-cut, and providing both in such an arbitrary manner promotes confusion. Several people suggest good ways on when to use which, but it's silly to have a file format that apparently needs such recommendations to be repeated so many times. "Attributes are for metadata", the claim goes, "and elements are for data". I'll briefly ignore (as most such recommendations do) the fact that metadata may need to be structured as much as, or more than, data. My main point to this paragraph is that the distinction between data and metadata becomes blurred, particularly when you have different processors working on the same document. And isn't that what XML is for? If everything were clear-cut between attributes and metadata, we wouldn't have things like processing instructions (meta-meta-data?) and the like. Certainly, the xmlns attribute is on a different metalevel than some program's name attribute; why not draw a new metalevel distinction for that? Trying to draw a line between data and metadata like XML does a futile exercise. Everybody's metadata is somebody else's data, so just use namespaces or something instead of trying to make a distinction inherent to the file format.
Most people these days use object-oriented programming, and it's nice to be able to read in a file and have it generate an object tree of appropriate classes. But surprise! it doesn't work that way. The best time to do type assignment is during validation, and there's three main validation systems: DTDs, XML Schema, and Relax NG. DTDs are woefully inadequate in more ways than I can describe. Relax NG, while my favorite validation structure, doesn't assign type information to the data as it's validating because of the holy "Thou shalt not augment the infoset" mandate of Relax NG. Even though it has several constructs that just scream OOP subclassing, those constructs aren't visible to the user, just the parser. XML Schema puts type information into the infoset, so XPath etc can get at it. Other than that, once you get beyond the trivial, it sucks in just about every other way: it's not orthogonal, its extensibility and reuse constructs are as pathetic as anything I've ever seen, and it reeks of something that was built by passing a slapped-together application-specific spec into a committee after it's been pushed in several directions without any thought of graceful growth (which, of course, it was).
This is just what came to me quickly (although I've been wrestling with these issues for a while); I haven't stopped typing to think about what I'm typing. So I haven't carefully spelled out my issues, but that's okay; other people have issues too that you can read about. There's lots of places on the web to talk about the suckage of XML's file format. My personal favorites usually come from discussions among Lispers, because any Lisp programmer sees XML as sexps in drag.
Now that I've ranted about XML for a while, don't miss the point of my post (the GPP). XML brings some fine things to the table; I've spent the last year working heavily with XML by my own choice. But it's not a great file format. Many people see just the file format, and think that XML sucks. My point was to bring other aspects of XML to light, aspects that make XML a good thing overall.
And what's your propos
I really like XSLT for code generators, with the meta-data in XML. I do, however, miss the sheer perversity of using Access VBA to generate Java.
For the project I'm working on, I considered doing just that. In the end, I decided it would be easier to use XSLT to transform the XML into domain-specific Lisp sexps, and then use Lisp to transform the data into the code format I need. But it certainly is fun!
I got quite a few responses here, but yours gave me a real chuckle; thanks!
You're stupid.
Ah, personal attacks, the last refuge of an argument without decent support. Or is it the first? It's so hard to remember.
That XML is extensible and self-describing gives it a huge advantage over cobbling together your own file format.
You're comparing two orthogonal attributes. One attribute is the power of the language, that is, that it's extensible and self-describing. The other attribute is that it's a standard.
You can have languages that sit in any quadrant of the plane you just described: powerful standard languages (sexps), powerful non-standard languages (how most good programming languages start out), weak standard languages (HTML 1.0), weak non-standard languages (a deluge of config file formats come to mind). The fact that XML is extensible and self-describing has nothing to do with the fact that it's standard, rather than cobbled together by myself (or some other random hacker). It is possible for somebody to cobble together a language that is extensible and self-describing, and it is possible for a committee to standardize a format that is neither. So claims that a language's position on one axis gives it an advantage over languages that are positioned in a particular manner on the orthogonal axis are specious.
RelaxNG, for instance, lets you verify that your XML file is built correctly for your app: you write a RelaxNG spec for your XML file format, and then it verifies that all the mandatory fields are there, in whatever order is necessary, with the correct datatypes, etc, etc. RelaxNG processors are part of most major XML libraries now, so if you're writing Perl you can just tell your Perl library to validate your file and it's done. If you're editing in Emacs (with nxml-mode), you can point Emacs at your RelaxNG file, and have tab completion, error highlighting, etc, etc-- all customized for your file format.
XSLT lets you take an XML file and perform transformations on it into another (possibly XML) file format. Need to convert XML into SQL INSERTS? Piece of cake. I use it to extract particular parts of an XML file and convert them into a significantly differently-ordered Lisp structure.
Most modern web browsers are becoming CSS engines rather than HTML engines. So you can stick a CSS stylesheet reference at the top of your XML file, and have the CSS generate something that looks like what you want the user to see. The data file looks good to the app, and looks good to the user. You can also (with some browsers) use more powerful transformations using something like DSSSL or XSLT.
DOM for a standard data manipulation API, so each program you write doesn't have a different data access language. XPath as a language to perform more complex queries. XML Namespaces to let users or apps tag their data with extensions. XInclude for data sharing. All of these are things you get for free with XML.
All of these are general technologies, not specific apps. So they should be usable in most major libraries in most languages. (If you're using Perl, I'd recommend XML::LibXML.)
Don't think of XML as just a file format, because that part sucks. Think of it as a buffet table of technologies. When you write a program, 10% is to do the program's processing; the other 90% is to handle I/O, data management, and other housekeeping. Using XML lets you get a lot of that for free.
PS: I'm not an XML fanatic. A year ago, I was told to use XML for one particular project and was disgusted at the idea. I still think that XML gets a lot wrong, but I've come to recognize what benefits XML provides.
Not a big fan of several of the answers. The first batch of questions has some things that are quite wrong (virtual memory is implemented by time-sharing?!?) and there's no comments correcting them. At least the one that starts with "ls -ltra" has comments, although the main page doesn't make that clear.
One problem with providing answers to interview questions is that it's almost useless. If the interviewer knows the correct answers, then they don't need it. If the interviewer doesn't, then the questions must be crafted to only have one correct answer (eg, "What does UDP stand for?"), and such questions are often teh suck. Otherwise (still discussing the case when the interviewer doesn't know the problem domain) you end up with situations like the windows/unix filesharing question, in which the interviewer expects to hear NFS while many respondants would reply SAMBA. Open-ended questions, such as "what does [technology] do" are the worst in this scenario. So I don't think that providing answers helps.
Beware also of "opinion" questions, such as "what is the main advantage of symlinks over copies". The question on your site says that permissions are shared, while I think that the main advantage is that modifications are shared. Somebody coming from an embedded systems background may well have good reason to say that the main advantage is disk space.
I guess my point is, it's perilous to interview for Linux folk if you don't know enough Linux to deal with a variety of correct answers.
They certainly don't teach us that in high school.
That's because history is written by the winners.
This suspension of the Bill of Rights at the sole discretion of the Administration is literally an unprecedented extension of authoritarian power to the President.
It's not unprecedented.
Point one: The request for ID was never mandatory; the airlines had been fighting for it to be mandatory for some time, since they didn't want tickets to be transferrable.
Point two: The request for ID by itself is not as serious, in many people's minds, as the fact that we are bound by regulations that we are not allowed to know.
I mean, this information was known for what, 12 years?
You're thinking of Vigor's EULA.
I'm not sure why I put that in there. Of most of the Vigor users I've seen, I can't imagine why I'd want their children.
How does registering the trademark help him go after his competition?
Okay, let's consider alternatives to always-on.
Autodial gets you a good portion of the way there. A good autodialer should take only a couple of seconds, in other words, not much longer than you need to focus on the screen anyway.
A cronjob can fetch your email periodically, so you can glance at your screen and see that you have mail. And you don't care if there's a few seconds delay on your outbound mail; let your MTA deal with that.
As for webbrowsing... hmmmm, that's a bit tougher.... Okay, here's one. Put in a proxy. If the net connection is up, then it just works transparently. (And by the way, Squid really does seem to speed up my web fetch times, even from the same computer!) If the connection is down, it brings it up, sure, but what to do in the meantime? Well, if you're visiting /., then it says "Nothing to see here, move along". If you're not, then it redirects to the same URL with a typo (so you'll assume you screwed up), and then displays a parking page. Okay, that sounds pretty authentic.
IM? Piece of cake: grab an IRC server and a bunch of Eliza-bots.
Okay, you're all set! Always-on experience, on a dialup budget!
Hardly. Anybody who's watched movies since the 80s knows that when a server is overloaded, sparks shoot out loudly and the server emits a high-pitched whine just before exploding.
Use a VERB, VERB, VERB,
He did. According to the 1912 Webster's (easily available online), "period" can be a transitive or intransitive verb.
Since the OP was an AC, the GPP decided to dub him "Period" as a moniker to remind the GPP of his crime, and his first and third usages were as a noun of direct address to emphasize this (as in, "Sir, yes, sir!"); the part after it was a modifier of the final "period". The middle usage of "period" was as an intransitive verb.
MD5 from Fourmilab can do both of those jobs. It's fine (AFAIK) for checking downloaded binaries. For cryptographic purposes (like what I proposed), you'll want some crypto knowledge for anything serious, but it should be fine for "toy" usages like claiming /. posts.
Another good idea: run a string along the conduit. That way, when you have to pull something later, you can pull it on the string (along with a new string). Easier than using fishtape, and (in my not-so-experienced opinion) less concern about cracking fiber.
When you're choosing the conduit's thickness, don't forget that you're likely to have some runs with some thick bits of cable; for example, your home entertainment center may eventually have RG6 (for the TV cable), cat5 and/or fiber (for the home entertainment PC and/or TiVo), four pairs of speaker wire (to the 7.1 system's surround speakers), a stereo pair of audio signal wires (to the house music distribution panel), plus some stuff I haven't considered. You'll need some more room in the bends to make sure that there's plenty of space and cables don't get kinked; cable kinking can do icky things to signals even when it doesn't affect DC.
I'm no architect, so I don't know how much your choice of building materials here is going to affect fire risk. Talk to a pro to make sure that the conduit doesn't make your home into a firetrap (by channeling fire to all the house walls quickly). You may need to use plenum cables at some points. But again, I'm not a pro.
When making a reference comment as an AC, you may want to include an MD5 sum of a phrase of your choosing. That way, when you refer back to it, you can demonstrate that it was really you.
The problem is, politics is a fork bomb.
And each committee does a setsid, so you can't kill off process groups wholesale.
Where was it that you remember these roadblocks?
The eastern border, in the Mojave Desert; I was driving from Texas. I think it was on I-40-- I have a vague memory of an oil change that I think was in Needles-- but it may have been I-10 or somewhere else in that region.
Last time I drove over the state lines was when I moved to California. But at that time, they had roadblocks set up to ask everybody if they were carrying any fruits or veggies.
So possibly those same roadblocks could sign off a milage log when you enter or leave the state. Purely voluntary, but it's an easy way for you to prove that you were driving X miles outside of the state.
I'm confused. What does LDAP have to do with single sign-on? I thought it was just to manage directory information.
The third example has nothing with it as far as I can see. Care to enlighten me on that one?
I'm not the AC who criticized, but my best guess is that there's no antecedent of "it". Not something that I feel is worth griping about in this case.
In the first sentence, I don't feel that the comma is a problem. But the spelling is, and the antecedent of "they" is uncertain.
Obviously, I don't hold people to perfect grammar in /. posts-- myself included, as you can see.
A DVD decoder in Windows isn't a stand-alone application, but an addition to the DirectShow architecture, which still is the most powerful and easy to use multimedia rendering solution available on the desktop.
How do you figure?
I'm not looking to start a fight here, but why do you feel that DirectShow is more powerful and easier to use than Quicktime?