dwheeler · Slashdot Mirror

Implementation matters! on Star/OpenOffice XML Format To Become ISO Standard? · 2004-09-27 06:12 · Score: 2, Insightful

You're quite correct, but you're missing an important factor: widespread inexpensive implementations.

ISO's OSI stack was standardized before it was really implemented, with the result that the implementations were large, clumsy, and clunky, if you could get them at all. This is a big risk of standardizing something before you implement it. SGML at least had some implementations, but the implementations were hairy (to get all the details right), so the resulting libraries were expensive.

In contrast, OpenOffice.org presumably already implements this specification (or something very similar to it), and is available for free. So the major reasons that OSI lost are gone. Note that XML has done well in the marketplace - they took SGML, simplified it, and implemented things before they declared version 1.0. And TCP/IP is the prime example of trying things out before you declare them as officially a standard.

Sure, there's a battle here, but it's possible.

Certainly, there is a risk of "embrace and extend" becoming an interoperability problem. In the end, consumers need to be the ones guarding against that.

Legislation? long-term and public information on Star/OpenOffice XML Format To Become ISO Standard? · 2004-09-27 04:35 · Score: 2, Insightful

Once it's an ISO standard, I can easily imagine EU-wide legislation requiring that all government documents that (A) must be stored long-term, or (B) provided to the public, be provided in such a standard format.

Actually, it's a bad idea to depend on ANY single vendor for the format of important records that have to be held long-term. We can still read the Magna Carta, no problem. Anyone tried to read Microsoft PowerPoint version 2 files? Or WordStar files? Even Word Perfect is increasingly complicated for many people.

For long-term records, I can easily imagine a requirement to store them in an ISO-standard format. OO.o's format is actually especially nice: it's compressed (.zip) and XML-based, so it takes very little space.. perfect for long-term storage. Even if all the programs stopped working, as long as you knew how to unzip the files, you could view them in XML.

For public information, you need a format that any user could read, no matter what their operating system or office programs are. Again, a standard format works nicely. And the fact that OO.o files are compressed is helpful for low-bandwidth users (esp. the poor and those in eastern Europe).

Microsoft's ".doc" format has been used for these purposes, but it's not really good at it. It's really only designed for a single word processor, it's not really documented, it doesn't support standards like XML, etc. And I believe Microsoft's new XML format doesn't even capture all the information from Word (while OO.O's clearly does). The ".rtf" format isn't really that much better. And although they're talking about developing better conversion software, the OO.o software already includes .doc conversion software, which could already be used to support an upgrade.

There's already work to create a standard for PDF to support very long-lived documents that must be available "forever" to arbitrary platforms. It's called PDF-Archive PDF-Archive looks very useful for its purposes, but it won't support exchange of editable documents; its purpose is to fix everything (such as page breaks and so on).

The world's needed a standardized editable office document format for a long time, where the standard is a real standard that is publicly documented, can be implemented by multiple vendors (without patent royalties/limitations), and isn't controlled by any one company. Maybe the world will finally get such a standard.

Frankly, if there's a standard and the EU pulls off such legislation, that's a big coup. If many governments start releasing files in such formats, then others will want to make sure they can read/write those formats. And if it's a standard, it's much more likely that competitors (like OpenOffice.org itself) will have a chance.

Re:C++ ABI issues? on Linux Standard Base 2.0 released · 2004-09-16 04:41 · Score: 1

That's correct, the hash table isn't stored in simple alphabetic order. If the hash algorithm were ideal, this isn't an issue. The ELF hash algorithm isn't _awful_, but I don't think it's stellar either. There are certainly reports of many collisions, and it would be extremely painful to change the hash algorithm. (If anyone has hard numbers on ELF hash algorithm performance I'd love to hear about it.)

rpm handles multiple libraries painlessly on Two Years Before the Prompt: A Linux Odyssey · 2004-09-16 04:38 · Score: 1

RPM handles multiple libraries painlessly. Use "-i" if you want to install a new library (while keeping the old), and use "-U" if you want to update a library (in other words, remove the old and add the new).

C++ ABI issues? on Linux Standard Base 2.0 released · 2004-09-13 09:09 · Score: 3, Interesting

At one time, there was concern by some that the LSB was trying to freeze the C++ "too soon". See this LWN posting for more info.

I presume that LSB is simply spec'ing existing practice, correct? Or have things changed since that posting? Is this really an issue, even, since a system might be able to support an "old" and "new" C++ ABI by having both the "old" and "new" libraries installed?

Also: if the C++ symbols will be stored as (name space + package + class + method) in that order, ELF is used, and there are many hash collisions, this might create a lot of overhead loading large C++ libraries. The reason: while linking, you'd have to compare a lot of text before matching, because so many symbol entries would have a common prefix that you'd have to keep matching over and over again. Am I reading this correctly?

You certainly CAN have multiple library versions! on Two Years Before the Prompt: A Linux Odyssey · 2004-09-10 04:26 · Score: 1

This article is completely incorrect on a key point: You CAN have multiple versions of the same library, indeed, that's been true for a very long time. See the Program Library HOWTO for details.

Anger/distrust of liars, Sun-dependence, OS pain. on Why is Java Considered Un-Cool? · 2004-08-24 02:46 · Score: 2, Informative

Some other reasons don't seem to get mentioned.

First, there's still anger and distrust of Sun. When Java first came out, Sun promised to help make Java a standard not solely controlled by any one vendor, and Sun started working with ECMA and ISO to make it so. IBM invested over $1billion with that understanding. Then Sun suddenly decided to take Java out of the standards process, and take complete control over Java. Yes, there's a "Java Community Process", but look at the process: if Sun doesn't like it, it's dead. Period. That's not an independence, it's a dictator model. And it's not necessarily benevolent; in an open source software project, you could fork the project if things went really badly (e.g., XFree86), but there's no mechanism for a true 'vote of no confidence' in the current process.

Fundamentally, developing in Java still primarily involves kneeling to Sun. We have lots of C and C++ compilers, with vendor-independent standards for them. Many other languages have standards, too. There's no need to return to a language totally controlled by any single vendor, that's a model from decades ago. Yes, there are other Java implementations, but not many; few others support the GUIs, and none support the massive library that's the primary point of using a language like Java. gcj does great stuff, but try compiling a normal Java program with Swing and other key libraries. C# is heavily controlled by Microsoft, and there are reasons to distrust that too, but at least Microsoft managed to release the language fundamentals to a more neutral party; why can't Sun exceed those low expectations?

And on most systems, implementing a Java system is a pain. It doesn't come with Microsoft, who's actively trying to kill it. It doesn't come with any purely open source software OS (Fedora, Debian, etc.), because it's not open source. This isn't a killing problem, but it does make development of Java applets essentially hopeless -- because it's quite likely that users will NOT have the necessary plug-in. You can do Java application development, and install the necessary libraries -- on servers that's not a big deal, but it's a little more painful on clients for client applications. But at that point Java enters a crowded field: there are LOTS of languages that can be used this way.

There's a lot to like about Java. But Sun has managed, through a series of missteps, to make a lot of people unhappy and avoid Java, even if Java would be a fine fit technically.

Failure to "feel distance" isn't the same thing on Writing Software for Worldwide Distribution Proves Difficult · 2004-08-19 03:54 · Score: 3, Informative

I don't think the difficulty people have in understanding differings scales is the same thing at all. Knowing what the locations of major features and countries of Earth is something that can be taught in school - and NEEDS to be taught there.

But understanding the differing scale of things is much harder for human brains wrap around. Yes, they can be described by measuring distance or travel time, but it's hard to really understand differences in scale until you've been there. E.G., I remember visiting in the UK, and some people described "far away" villages which were closer than my daily commute. This is just one of the many reasons that you need to visit a place to really understand it.

Better article: "Independence Way" by Sam Jaffe on Getting Serious About Fuel Cells · 2004-08-13 23:54 · Score: 1

A better article is the (indirectly-linked to) "Independence Way" by Sam Jaffe. It discusses cellulosic ethanol and ethanol reconstituters. Promising stuff.

Good ruling. on Jerry Falwell Wins Dispute Over Fallwell.com · 2004-08-11 03:27 · Score: 2, Insightful

Someone has taken a previously existing name, and has been exploiting it for their own gain by trying to confuse the public. And got caught. The fact that it's Jerry Falwell is immaterial; it's the actions of the other guy that were wrong. This is exactly what was wrong with trying to extort away the Katie.com domain, too. I have my own domain name, and I don't want other people stealing it, or confusing people with subtle variations. This is a good ruling; it protects people everywhere from shams and scams.

You have power. Use it. on Publisher Renames 'Katie.com' · 2004-08-06 09:58 · Score: 1

If nothing else, I hope this episode makes it clear that when technical people band together to protest a wrong, they can sometimes force it to be made right. I've read too much about how "geeks can't get anything changed," "geeks have no power," etc. If you give up, then of course you'll lose - you deserve it.

It looks to me that this sorry episode finally has a reasonable ending: the book will be renamed, and the author actually owns the relevant domain. It's too bad what had to happen in the intervening time, but it looks like a very reasonable ending at least.

So speak up for other issues. Who's speaking up about the problems of software patents (particularly the egregious ones) to those who can do something - have you contacted your Congressman? How about other issues? You won't win all battles, but you'll lose all battles you don't fight.

Open voting consortium & Voter verified receip on Australian Voting Software Goes Closed Source · 2004-08-04 10:28 · Score: 3, Informative

You might want to also check outThe Open Voting Consortium (OVC) is a non-profit organization dedicated to the development, maintenance, and delivery of open voting systems for use in public elections. OVC is developing a reference version of free voting software to run on very inexpensive PC hardware, which produces voter-verifiable paper ballots.

One real problem with eVACS is that, to my knowledge, it doesn't produce voter-verified receipts yet (please let me know if I'm wrong). Thankfully, the new OSS/FS site identifies this as one of the first things to be added. As noted by places such as the verified voting site, voter verified receipts are a critical need. In fact, I'd argue that only the counted paper ballots should actually count, and make sure that the vote-creating and vote-counting systems are separate (using some sort of standard representation on the paper, so that you can have different groups re-implement each side).

Re:Other studies: Red Hat LInux 7.1, Debian 2.2 on CPAN: $677 Million of Perl · 2004-07-30 05:59 · Score: 1

You did design the system, write the documentation, wrote and ran the tests, and debugged it, right? The estimates take the number of lines of code, and use it to estimate the entire estimation time. Actually writing the code is often a small fraction of the total development time.... but the SLOC is a handy way to estimate the rest of it.

Re:Perl coders make $135k/year? on CPAN: $677 Million of Perl · 2004-07-30 05:28 · Score: 1

Take a peek at the Gigabuck paper, especially section 2.3, which explains how the numbers are derived. It includes salary and all overhead (office space, management, benefits, etc.). Quote: "For programmer salary averages, I used a salary survey from the September 4, 2000 issue of ComputerWorld; their survey claimed that this annual programmer salary averaged $56,286 in the United States. I was unable to find a publicly-backed average value for overhead, also called the ``wrap rate.'' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate. Some Defense Systems Management College (DSMC) training material gives examples of 2.3 (125.95%+100%) not including general and administrative (G&A) overhead, and 2.81 when including G&A (125% engineering overhead, plus 25% on top of that amount for G&A) [DSMC]. This at least suggests that 2.4 is a plausible estimate. Clearly, these values vary widely by company and region; the information provided in this paper is enough to use different numbers if desired. These are the same values as used in my last report.

Note that these are year 2000 U.S. dollars.

Comments on CPAN: $677 Million of Perl · 2004-07-30 05:19 · Score: 1

SLOCCount measures "physical SLOC", and thus ignores blank lines and comment-only lines (including Perl PODs). It's not the same as "wc -l". Go read its documentation if you want to understand exactly what it does; it has a lengthy description of exactly what it measures, and why, along with references to the (substantial) research literature behind such tools.

For effort, it works well on CPAN: $677 Million of Perl · 2004-07-30 05:17 · Score: 1

Actually, your argument suggests it's a reasonable measure. Yes, one line of Perl can do more than a typical line in C. So, a program in Perl should take fewer lines, and thus less effort, than if you'd done it in C. By measuring the SLOC of the Perl code, you can then estimate effort to create that Perl code, which would have been less in Perl than if it were done in C.

If your argument is, "this measure doesn't measure how many lines of code it would have taken in C", or "how much effort would it have taken if it was written in C", well, that's true. So what? That wasn't what was being measured. If that's what you wanted, there are well-known conversion factors where you can estimate the SLOC in C, and convert it to effort. But those conversion factors are estimates with a LOT of slop, and the published conversion factors have almost no published data to justify them, nor do they identify the ranges and standard deviations and other caveats. But if that's what you wanted, I'm not sure if there's a better way to do it.

Other studies: Red Hat LInux 7.1, Debian 2.2 on CPAN: $677 Million of Perl · 2004-07-30 05:03 · Score: 2, Informative

If you find this interesting, you might also want to take a look at my updated paper More than a Gigabuck: Estimating GNU/Linux's Size, which examines Red Hat Linux 7.1. The "Gigabuck" paper shows that:

It would cost over $1 billion (a Gigabuck) to develop this Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars).
It includes over 30 million physical source lines of code (SLOC).
It would have required about 8,000 person-years of development time, as determined using the widely-used basic COCOMO model.
Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2 (which was released about one year earlier).

Another related paper (that I didn't write) is Counting Potatoes: The size of Debian 2.2. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.

So what's the purpose of all these studies? Insight. There are all sorts of limitations in any measure, including any source lines of code (SLOC) measure. But, in spite of those limitations, there are things you can learn. Using tools (like SLOC counting tools) to measure software can help you understand things about the software, as long as you understand the limitations of the measure.

In particular, many studies have shown that SLOC is very strongly related to effort (so much so that you can even use equations to predict it). If you want to determine effort in CPAN, you can't just go ask people; few open source software / Free Software (OSS/FS) developers record exactly how much effort they invested. So, these kinds of measures are really helpful for estimating how much effort went into developing the software. Obviously, not all effort is equal (a genius can turn a hard problem into an easy one). And not all code is good, or even useful. But if you want to understand and measure effort, then these measures do have a value. In particular, these results have shown that OSS/FS can scale up to large projects requiring large amounts of effort.

Not sure Wikipedia keeps really old histories on Wikipedia Founder Jimmy Wales Responds · 2004-07-28 10:35 · Score: 1

I'm not sure that the Wikipedia keeps really old versions of articles; it may archive and remove them, to keep storage costs down.

Re:The Cautionary Tale of XFree86 on FreeBSD Moves to X.Org · 2004-07-24 16:45 · Score: 2, Informative

To my knowledge, they'll continue to use the original MIT/X license. It's known to be GPL-compatible (the main point of contention), and it's the license they've been using all along in general. It's certainly the direction of least resistance.

It turns out a few files have slipped in with licenses other than the MIT/X licenses. My appendix links to a detailed license analysis (I didn't do the analysis, kudos to the person who did!). But there aren't many such files, and it wouldn't take much to fix them. It's likely that some weren't even intentional, and contacting the authors would be all that's needed in some cases.

I very much doubt that they'd move to the GPL. This is a project shared between GPL'ed operating systems, *BSDs, proprietary X vendors, and proprietary OS vendors; a GPL move would break that. I guess it's conceivable they'll later move parts to the LGPL, particularly easily separable parts (like a sound server). Mesa was originally LGPL, for example. And the commercial environment has changed since X was started; some projects like Wine have decided to switch from MIT/X/BSD-like licenses to the LGPL, because they believed that too many commercial companies would take but not give back otherwise (rendering the project unable to continue). So you could argue that the changed environment might encourage them to use a different license to keep the project more viable. But I suspect that won't happen, at least in the short term. Most people seem to be interested in keeping the "status quo" MIT/X license, and more interested in rearchitecting and adding new features. I don't speak from special authority, just as someone who occasionally follows the discussions.

The Cautionary Tale of XFree86 on FreeBSD Moves to X.Org · 2004-07-24 11:42 · Score: 2, Interesting

More details on this story are in my appendix The Cautionary Tale of XFree86, part of my essay Make Your Open Source Software GPL-Compatible. Or Else.

Re:Marathon! on Will LOTR:ROTK Extended Edition Hit Cinemas? · 2004-07-07 10:47 · Score: 4, Funny

Don't forgot breaks for second breakfast, elevensies, ...

US voting can be complicated on ACM Eyes Policy Position on Electronic Voting · 2004-07-03 06:22 · Score: 2, Interesting

It's true that if a vote has only one issue, hand counts aren't really that hard to do. Canada does them all the time, and it works well for them.

But in the U.S., most ballots are much more complicated. We (in the US) have a tradition of wanting the citizenry to speak out/vote directly on a number of different issues, and having seperate local and state elections. It's a pain to setup a poll, and a pain go to a poll, so a voting decision is actually more complicated for US citizenry than a non-US citizen might think. A vote might involve federal election (a President, House member, and a Senator), state election (a state senator / representative / governor), local election (county/town board, mayor, school board, sheriff, judges). It probably also involves multiple bond decisions ("shall the state take out a loan of $X to do Y"), and proposals to change the state constitution in various ways. When I go to a poll, I'd be shocked if there were fewer than 4 choices, and there are usually many, many more.

As a U.S. citizen, I'm used to it, and even like it -- it allows me to participate more directly in various decisions than citizens of some other democracies. And the multi-tiered approach to democracy is deeply embedded in how U.S. politics works. But the more complicated ballots, along with the sheer number of people in the U.S., make it the purely manual approach more painful. It's still possible, of course, but some sort of automation is desirable.

Untrustworthy automation is a terrible idea, of course. Hopefully various organizations like the ACM and Verified Voting will change the system so that we can actually have confidence that our votes are being fairly counted.

Oh, and the problem in the 2000 election wasn't that recounts are illegal. Recounts happen occasionally in the U.S., they're even required in certain cases. As I understand it, the problem was that recount rules are supposed to be consistent and clear before the election, and Florida's setup was revealed to be an absolute travesty. Of course, these unauditable electron-only voting machines have exactly the same problem; there's no consistent and clear way to do a real recount, because there's nothing that can be independently recounted. Instead of creating a recount travesty, they need to make real recounts possible. And a computer-printed (and human-verified) paper vote would eliminate the nasty problems in the Florida 2000 election, where it was incredibly difficult to figure out the voter's intent from a card with multiple hanging chads (with more hanging chads created through handling!).

Have the patent issues been resolved? on Lead Developer of SPF Anti-Spam Scheme Interviewed · 2004-07-01 10:15 · Score: 2, Interesting

The original SPF is, to my knowledge, clear of software patent problems. BUT Microsoft has quite publicly stated that they're pursuing patents on their caller-id approach. So if they've created a combined approach, the combined approach has all the patent concerns of Microsoft's original approach.

So what's the story on the patent claims? Boycott Email caller id still seems wary, and I see no evidence that Eben Moglen's concerns (such as incompatibility with the GPL) have been addressed.

Any such patent application is likely to be granted, since Microsoft has lots of $$$ to press their case and the patent office has neither the knowledge or time to determine if they're obvious or in any other way counter them.

I remember the joys of dealing with GIF patents. We're better off without this combined approach if the patent applications will make it unworkable.

So, what's the situation?

Here are lots of facts they forgot to mention on Microsoft's Magical 'Myth-Busting' Tour · 2004-06-11 16:05 · Score: 1

Take a look at Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!, which has a large collection of quantitative studies suggesting that looking at OSS/FS (including GNU/Linux systems) is a good idea. Just updated as of a few days ago.

Any software acquirer should look at all sides of an issue, and not just take any vendor's word for how wonderful their products are.

Secure Programming HOWTO on PHP and SQL Security · 2004-04-27 03:17 · Score: 4, Informative

For guidelines on how to develop secure programs, see my Secure Programming for Linux and Unix HOWTO. This Free book provides a set of design and implementation guidelines for writing secure programs for Linux and Unix systems. That includes application programs used as viewers of remote data, web applications (including CGI scripts), network servers, and setuid/setgid programs. The book includes specific guidance for a number of languages, including C, C++, Java, Perl, Python, PHP, and Ada95.

Slashdot Mirror

User: dwheeler

Comments · 525