ttfkam · Slashdot Mirror

Re:Too many 'this stuff sucks' moments on The Future of XML · 2008-02-08 13:11 · Score: 1

Unless some of your data requires a different character encoding. Then your JSON, byte offset, or pipe delimited solution falls down. Then of course there's the character escaping situation. With JSON, every quote character must be escaped. With byte offsets, you are wasting space and are limited to a maximum length for each entry (the amount of the byte offset). With pipe-delimited data, escaping embedded pipe characters is a pain in the ass -- sure you can just put those fields in quotes, but then you also have quote escaping to deal with.

And then finally there's the issue of security. With XML, you're reading in data. Period. With JSON, you are eval()-ing code. In most cases, this is just a passed data structure or two. Who's to stop the XSS attack with malicious JSON payloads? Likely? No, but with the XML solution, it's less than likely.

As for the difficulty in reading XML, do you have the same difficulty in reading HTML too? Or is that there is well-written XML and poorly-written XML just like HTML.

Re:Too many 'this stuff sucks' moments on The Future of XML · 2008-02-08 13:03 · Score: 1

And conversion to ASN.1 dictionaries and data transport is far easier and more reliable when used in conjunction with xml schema languages so as to enumerate the element/attributes prior to dealing with instance documents.

Of course the fact that convertors from XML to ASN.1 and back again exist aids the previous post far more than your own. The point to the previous post was that parsers and libraries already exist and in such variety that custom parsers and wire protocols are largely unnecessary except in corner cases.

It may in fact be the case that you live in those corner cases, but that doesn't address the other 98% of developers who do not.

Your example doesn't have to be verbose on The Future of XML · 2008-02-08 12:54 · Score: 1

The following in RelaxNG compact syntax might be more palatable to some:

element myelement { xsd:token { minLength="1" maxLength="20" } }

Great post. I heartily enjoyed it. Cheers!

Re:Too many 'this stuff sucks' moments on The Future of XML · 2008-02-08 12:41 · Score: 1

You remind me of those old gopher enthusiasts who referred to images as unnecessary and bloated when faced with the web and HTTP.

If you can create a browser that handles full graphics including variable opacity, correct font metrics, and is easy to develop content for but uses substantially less memory than any of the browsers on the market, I will give you the contents of my savings account now as a show of gratitude.

The reality is that people like looking at pretty pictures. People like looking at non-jaggy shapes and curves. And while the original KHTML renderer was only a couple hundred kilobytes, it was also capable of rendering only what a couple hundred kilobytes could muster. Then again, its memory footprint was substantially larger than a couple hundred kilobytes, but I don't blame Konqueror for that; rendering fonts well and laying out graphical user interfaces requires large amounts of memory. Luckily for us, a large amount of memory is relatively cheap.

As for there being too many specs, no one's pointing a gun to your head to use them all. Use XML, DOM, RelaxNG, with a sprinkling of XPath and be on your way. If those don't suit all of your use cases, try adding another. Anyone that suggests that a small number of specs can handle all eventualities in a complex world is either a liar or being foolish. Which are you?

Re:Too many 'this stuff sucks' moments on The Future of XML · 2008-02-08 12:29 · Score: 1

S-expressions:
* No non-ASCII character set support
* No Unicode support
* Does not fail fast on malformed data
* Only really works with LISP
* No good tool support

Sorry, that's not a viable alternative to XML.

This is why you validate against a schema on The Future of XML · 2008-02-08 12:24 · Score: 1

Just because DTDs suck doesn't mean you can't use RelaxNG or XML Schema, both of which allow you to constrain the values allowed for text or an attribute. As for blowing up when an element name consists of 100MBs of 'a' *AND* you are validating that XML document against a schema, that's a bug in the parser you're using, and you should file a bug report.

Re:I don't understand... on The Future of XML · 2008-02-08 07:58 · Score: 1

Because most of us have better things to do than write yet another data parser. I've written plenty. I'm over it. Now I want to work on other things.

As for the XML my CAD software uses, I'd convert it to other formats using XSLT or STX, I'd extract values from it using XPath or XQuery, and I'd validate it for correctness using DTDs, XML Schema, or RelaxNG.

And I would never have to write a parser or big logic block just for some input. All that time saved can be spent on things like -- oh, I don't know -- my actual program and the problem I'm actually trying to solve.

But you go on ahead writing your parsers. Be sure to put your five hundred parsers on display in some museum. I'm sure other folks will find them fascinating.

Re:"How will you use XML in years to come?" on The Future of XML · 2008-02-08 07:49 · Score: 1

Because they only work with LISP and LISP-like languages, because they don't fail-fast on invalid data, because you can't validate the input easily, because they don't support multiple character encodings, because the default character encoding isn't Unicode compatible...

Should I go on?

S-expressions? on The Future of XML · 2008-02-08 07:46 · Score: 1

Oh! You mean that thing that lacks unicode character support, that lacks any easy way to define the character set, and has plenty of other problems, like for example its inability to fail fast?

No, not fail-safe, I mean fail fast. As soon as an XML parser comes across a closing tag that fails to match the tag that opened it, the parser throws an parsing error. No so with S-expressions. With your LISP structural model, the parser/interpreter must find the closing parenthesis that fails to match. This means that it may parse to the end of a multi-kilobyte or multi-megabyte data file before it can ever know that anything is wrong? How is that more efficient?

(Answer: it's not.)

Then of course you would need to validate the input in the S-expressions. You *do* validate your input, right? Do you do it manually with LISP? How would you do it with any other language? After all, this solution of yours has to work with more than one programming language, right? For some reason, validating the input from an S-expression in C doesn't sound fun.

XML has DTDs (although I don't particularly like them), XML Schema, RelaxNG, RelaxNG compact syntax, etc. All of them work, and all of them have a strong following, so there's not much danger in choosing the "wrong one." Just use what you prefer. Well, of course, that is if you're using XML and not S-expressions. With S-expressions, you're just SOL.

So have fun in your own little S-expression world.

The FCC? on Fixing US Broadband Would Cost $100 Billion · 2008-01-31 12:18 · Score: 4, Interesting

You mean the same FCC the majority of whose members are appointed by the president of the United States? Or how about the SEC that allowed all those baby bells to get back together again. The chairman of the SEC is appointed by the president as well.

You're right that Bush wasn't president from 1994-2000; however, the US was at the forefront of technology and internet access at that time. After the tech bust in 2000 (self-evidently obviously not Bush's fault since he wasn't president yet) there was the opportunity to invest in infrastructure and prepare for the eventual economic recovery. Instead Bush gave out tax cuts right and left. Nice idea for stimulus except that he gave mostly to the richest who, contrary to the revisionist history of the Reagan era, do not trickle those funds efficiently down to the working class. He then stacked the FCC, SEC, and many other agencies with party hacks who didn't know the first thing about the real world, only their ideology.

So yeah, basically Bush takes a fair amount of blame here. Sure he had help, but that doesn't excuse him. Sure he had other things to do, but that doesn't excuse him.

Other things he had to do:
* Put someone competent in charge of FEMA
* Read the reports from various agencies and his predecessor about some guy named Osama
* Protect and defend the Constitution of the United States

Instead he spent time funneling money to his cronies and vetoing bipartisan child health care bills.

So now we have an infrastructure that is woefully behind and will take $100 billion to fix. Hurray us! Japan, South Korea, and other countries have faster speeds available than *anywhere* in the US. This isn't even an argument about per capita speeds or the fact that we've got a larger population over a larger area. Our fastest simply ain't that fast.

It's true that Congress takes its share of blame too. Lucky for my argument, it's been a Republican-controlled Congress since '94 and until very recently. There's been record government spending during Bush's tenure when he never vetoed a Republican bill (other than stem cell research funding) and yet we're still behind. Do the math.

The Mantle of Galileo on The Nuclear Power Renaissance · 2007-11-15 13:51 · Score: 1

"Alas, to wear the mantle of Galileo it is not enough that you be persecuted by an unkind establishment; you must also be right."

- Dr. Bob Park

-----

Hmmm... Go back and re-read my previous post. Be sure to keep an eye out for where I said that it would not be possible. Don't be too surprised when you can't find it. Your comment, my dear slashdotter, is an example of a Straw man fallacy; you are presenting my argument in a distorted light so as to easily refute it. Unfortunately, the argument you are refuting isn't mine.

I do hope that one day you will learn the difference between advocacy of armed forces in populated areas and requesting a prototype.

I also truly hope that you will learn why one should not base national energy policy on technology that has not been invented yet.

You Win; New Challenge on The Nuclear Power Renaissance · 2007-11-14 15:46 · Score: 1

And this goes to show why one needs to be careful with these challenges.

New challenge: show me a prototype that could convert solar power from orbit to the surface of the Earth in a controlled fashion and has a snowball's chance in hell of producing a statistically significant portion of the US electricity usage (5 trillion kilowatt-hours/year).

Learn to read on The Nuclear Power Renaissance · 2007-11-14 15:42 · Score: 1

I said prototype, not proposal. BIG DIFFERENCE.

The CIA spent a great deal of research money (read: tax dollars) to train people to effectively channel the thoughts of others or events not immediately available to them. In other words, they spent millions researching ESP. After all, it would have been a very "disruptive game changer" in the intelligence community.

Too bad it didn't amount to a pile a shit -- not even a *big* pile of shit.

Fine on The Nuclear Power Renaissance · 2007-11-14 15:37 · Score: 1

Fine, show me the prototype of a high-to-very-high-power microwave beam to electricity convertor.

Not communication, power.

Re:The thing is on The Nuclear Power Renaissance · 2007-11-14 14:03 · Score: 4, Insightful

But, really, the only reason we don't have space based solar power already is because it would devalue fuel and energy and destroy every power structure on earth that relies on it, and that's a tough sell politically. Capitalism relies on scarcity to keep everyone obedient.

That or the fact that no one has ever beamed energy from a satellite to a terrestrial site. Ever. Remember that thing called "an atmosphere?" So we're talking lasers, right? You want to show me where the prototype exists to convert a very-high-powered laser beam to an electricity source? Just one will do. Go on. Show me one example.

Won't sell because of a power conspiracy? Give me a break. If a company could do this already, they'd be launching satellites on a daily basis. Think about it for a moment: you could be the company that supplies most of the world's power while waving the banner of environmental responsibility. But *no one* has even built *a prototype* because of your supposed cabal?

I think your tin foil hat needs to be cleaned; you've been wearing it far too long already.

Re:Disposal? on The Nuclear Power Renaissance · 2007-11-14 13:56 · Score: 2, Interesting

Option 1: Vitrify (mix with glass to prevent chemical interaction with the environment) and drop to the bottom of the ocean at a subduction zone.

Over a short time the material will be covered in silt and mud. Over a long time it will be drawn into the Earth's crust and mantle. I'd call that a fairly permanent solution.

Option 2: Repeal the law banning enrichment for domestic power purposes.

Currently only about 2% of the fuel potential is actually used in today's power plant. If you can reprocess the spent fuel, separating out the junk from the readily fisible material, you can substantially reduce both the volume of waste and the amount of time the waste is dangerous.

Option 3: Move to thorium-based reactors.

For Thorium reactors, the fuel cycle is far more efficient and leaves far less waste and waste that is dangerous for a far shorter amount of time.

Option 4: Move to fast neutron reactors.

The fuel cycle is, again, far more efficient and leaves shorter-lived waste as well as far less waste.

-----

Those are four "good answers." No large-scale energy generation is going to be warm and fuzzy. Sorry, but that's the brutal truth. When you're talking about trillions of kilowatt-hours per year, it is absolutely the search for the lesser of many evils.

Think solar will solve our issues? We're having supply problems with silicon as it is. No, we're not running out of sand. Photovoltaics require clean rooms and much of the same infrastructure as computer chips. Lately, the price of computer chip materials have been increasing because of increasing solar panel production. What? Beam it down from space? Show me a prototype and I'll consider it. Until we see a proof of concept, it would be ridiculously stupid to base a nation's energy policy on it.

What? The solar panels that can be "painted?" Where was the prototype for that again? Exactly. Prototype comes before small-scale production. Small-scale production precedes large-scale production. If there's no prototype, you can't even begin to seriously consider policy based upon large-scale production.

That said, I think we should spend time with wind power, just not the windmill variety. Those suck.

Minimum 10MPH wind + Maximum 40MPH = Not Good Enough For a Nation.

Read about kite versions instead and why windmills just don't cut it. But once again I would want to see a proof of concept before committing.

Re:Are you serious? on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-06 07:06 · Score: 1

I never said there weren't good reasons for them to do so. Reread my post. I *never* said that. I'm simply saying that the choice has not worked out well in the long term now that it touches beyond just binary numerics; everyone has a computer now and the 1,000/1,024 dichotomy is causing confusion. In addition, the difference between 1,000 and 1,024 is greatly exacerbated once you hit the ranges of giga-, tera-, and beyond. An honest question: do you think early computer scientists would have used these terms had they envisioned terabyte storage units at the time?

1,000 is not a number we are all comfortable with. If it were then we would not have developed a notation to shorten it. If you show "1,000" to any individual over the age of seven, they will know what it means. If you show "0xFF" to a typical adult, they will stare at you blankly.

If you say that a kilometer is 1,000 meters, anyone with an eighth grade education will either understand or pick it up in less than 30 seconds.

If you say that a kilobyte is 1,024 bytes and a megabyte is 1,048,576, that same person will stare at you blankly, will not pick it up in less than 30 seconds, and will forget it within the hour.

I can't believe I actually had to persuade someone that kilo=1,000/mega=1,000,000 is an easier concept for general society than kilo=1,024/mega=1,048,576. And while I am aware that computing is not the same as civil engineering, you should be aware that the two fields -- in addition to the thousands of other fields -- need to communicate with one another from time to time. That was my point, not that they are the same field.

Re:1GB is really 1,000,000,000 bytes on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-06 06:44 · Score: 1

Exactly my point, which is why the names for binary units should change: to avoid confusion.

Re:GIBIBYTE SOUNDS RETARDED. on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-06 06:41 · Score: 1

Too bad that asshole hadn't ever left his parents' basement. Pot, meet kettle.

Re:Other Linux Java Options? on Red Hat Joins Open Source Java Project · 2007-11-06 06:39 · Score: 3, Informative

What does the uncompressed local copy have to do with download times? 14MB compressed takes just as long as 14MB uncompressed. If you think that your CPU can't handle fast decompression, just think of all of the web sites that gzip their content for network efficiency.

As for the complaint about docs, are you serious? Are you seriously complaining that there is too much documentation available in HTML format? And optional documentation at that? Think about what you're saying for a second: that you consider it a drawback that every class, method, and member of the JRE is consistently documented in detail.

GUI: AWT versus Swing are native widget peers versus internally rendered widgets.

RPC: RMI, CORBA, and XML-RPC/SOAP are for the following in order: RPC in a 100% Java environment, cross-platform binary RPC, and XML text-based RPC. There is a place for each of those.

XML parsers: are you referring to the SAX, DOM, and StAX parser APIs -- which would make three? Or do you mean two parsers like Crimson and Xerces. I think the former is self-evidently a good thing. The latter is due to compatibility and consistency through multiple releases as the older parser behavior may be necessary for an older app even if it's a little slower or more memory inefficient.

I can see your argument against including a scripting language, but Sun wanted to include a reference implementation of their pluggable scripting interface.

I/O: Blocking vs. non-blocking. What's the problem? Both have their uses.

What you call bloat, some would call completeness. Let's compare against some other popular languages.

Common Lisp: 10MB
Latest Python download for OS X: 17.9MB
Latest Perl download for OS X: 33.5MB (Linux version is between 18.9 and 24.8MB)
Latest Ruby (without Rails) download for OS X: 13.71MB

But don't take my word for it. Download for yourself. The only reason these other languages seem smaller to you is because they are bundled seamlessly with your Linux distribution.

Want database access, RPC, non-blocking I/O, XML parsing, etc. from those languages? Too bad, that's another download. Sure there are resources like CPAN, but why are their cores so bloated? Somehow Java is able to provide all of those "bloated" APIs at about the same download size as those languages that lack them.

And don't get me started on C and C++. They don't even have a standard database layer, XML library, or the like for you to download separately. Learned one non-blocking I/O library? Too bad, your new company uses a different one. Do you think ODBC is a good solution? Obviously you've never programmed for it.

I'm sure I could go on, but you get the picture.

Are you serious? on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-02 12:35 · Score: 1

Are you seriously suggesting that the fact that 2^10 and 10^3 are close together is merely a historical coincidence? Seriously?

0, 1, 2, 3, 4, 5, 6, 7, 8, 9... oops, ran out of numbers. I know! Let's assume there is a zero in front of the 9 like this: 09. So the front number goes from 0 to 1 and we reset the second number. We end up with 10.

Why 10? What's so special about it? Wouldn't 8, which is a power of 2, be more useful? Or some other base? Nope, we've got ten fingers and ten toes. We've got ten on the brain from a young age, hence my term "base-10 mind."

They chose 1,024 because it was the closest they could get with binary to 1,000, a number we are all comfortable with.

PMBjornerud said it best in his/her post:

Really, this is exactly why the SI-standard was introduced. A single standard across industries.

Good luck trying to explain to a civil engineer why we redefined the kilo to 1024. Bonus points if he considers it "professional" to redefine international standards to support nice hacks based on internal workings of our equipment.

It doesn't hurt computer scientists and programmers to say mebibyte instead of megabyte or gibibyte instead of gigabyte. It would solve the issue for everyone involved in that computer technology would accurately and unambiguously codify their distinct needs while everyone else would know exactly what we intend to signify. The only impediments are laziness and stubborness: resisting change for the sake of resisting change. There really isn't any other good reason not to switch other than, "We've always done it that way, and we don't want to change."

Re:1GB is really 1,000,000,000 bytes on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-02 09:47 · Score: 1

Bits and bytes may not be SI units, but the prefixes kilo-, mega-, giga-, tera-, etc. most certainly are specified by SI.

Quoting from the NIST page on binary SI units:

Once upon a time, computer professionals noticed that 210 was very nearly equal to 1000 and started using the SI prefix "kilo" to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous "everybody" bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.

Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today "everybody" does not "know" what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 2^20 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 106 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), "1.44 MB" diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.

Faced with this reality, the IEEE Standards Board decided that IEEE standards will use the conventional, internationally adopted, definitions of the SI prefixes. Mega will mean 1 000 000, except that the base-two definition may be used (if such usage is explicitly pointed out on a case-by-case basis) until such time that prefixes for binary multiples are adopted by an appropriate standards body.

Bold items are my own emphasis.

Re:1GB is really 1,000,000,000 bytes on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-02 09:35 · Score: 1

Good luck finding a 1 gigabit network adapter that transmits 1,024 megabits of data.

Re:1GB is really 1,000,000,000 bytes on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-02 09:30 · Score: 1

Math and, later, CompSci folks invented the concept of a byte, but they did not invent the prefixes kilo, mega, etc. Those were already in common use before the first electronic computer with vacuum tubes was a glimmer in Alan Turing's mind.

Their ORIGINAL meaning corresponds to the SI standard. Early computer folks used those prefixes INCORRECTLY to approximate the amount of memory/storage in use because 2^10 is close to 10^3. The approximation made it easier to grasp for the base-10 mind, but it's still just an approximation. Unfortunately, as seen here, the approximation became blind dogma.

EVERY OTHER FIELD OF STUDY uses kilo- to signify one thousand and mega- to mean one million. When one industry uses a term one way and thousands of other industries all use a term in a consistent but different way, what justification can you give for the single inconsistency.

You don't need to rewrite all of your software. Just start writing MiB instead of MB from now on and accept the inconsistency in older software. It's really not that hard to change and it will work itself as older software fails to the wayside.

Pop quiz: how many bits per second are theoretically transmitted by a GigE network adapter?
Hint: it's not a power of two.

Even our own industry can't get it straight.

I wish I had mod points on Seagate Offers Refunds on 6.2 Million Hard Drives · 2007-11-02 09:14 · Score: 1

Kudos for the excellent point.

Slashdot Mirror

User: ttfkam

Comments · 1,083