Domain: faqs.org
Stories and comments across the archive that link to faqs.org.
Stories · 30
-
Will Future Tesla Cars Use Metal-Air Batteries?
thecarchik writes "Most advocates and industry analysts expect lithium-ion batteries to dominate electric-car energy storage for the rest of this decade. But is Tesla Motors planning to add a new type of battery to increase the range of its electric cars? Tesla has filed for eight separate patents on uses of metal-air battery technology (for example, #20120041625). The metals covered for use in the metal-air battery are aluminum, iron, lithium, magnesium, vanadium, and zinc. Metal-air batteries, which slowly consume their anodes to give off energy, hit the news last month when Israeli startup Phinergy demonstrated its prototype battery and let reporters drive a test vehicle fitted with the energy-storage device. Mounted in a subcompact demonstration car, Phinergy's aluminum-air battery provides 1,000 miles of range, it said, and requires refills of distilled water (which acts as electrolyte in the cells) about every 200 miles." -
Big Brother Calls 'Shotgun' In Illinois
Reader kackle joins the army of free and accepted Slashdot submitters with this eyebrow-raising story: "I received a form letter from the Illinois State Toll Highway Authority saying that my first-generation 'IPASS' transponder needs to be replaced because the battery is old. I called them for clarification since the first-generation transponders obviously have user-replaceable batteries, and I wanted to keep this version because it beeps when a toll is paid. (This notifies drivers that their battery is still good, unlike the silent second-generation version, which informs them of a dead battery by sending a ticket in the mail.) The woman on the phone explained that they were replacing them just because the electronics are old. This uninformed answer made me research the device. I found that the manufacturer has recently filed a patent application for a new transponder that has a camera in it — a camera pointed inward at the occupants. How long before they make it illegal to cover that camera with tape?" -
Pigeon Protocol Finds a Practical Purpose
Selanit writes "Since David Waitzman wrote his tongue-in-cheek Standard for the Transmission of IP Datagrams on Avian Carriers, there have been occasional attempts to actually transmit information via pigeon. One group back in 2001 successfully sent a PING command. But now there's a practical use for pigeon-based communications: photographers working for the white-water rafting company Rocky Mountain Adventures send memory sticks full of digital photos via homing pigeon so the photos will be ready when the rafters finish up. The company has details on how the pigeons are trained and equipped. It may not be a full implementation of the Pigeon Protocol, but it works in narrow canyons far off the beaten path — and just as David Waitzman presciently predicted, they occasionally suffer packet loss due to hawks and ospreys." -
Lala Invents Network DRM
An anonymous reader writes in with a CNet story about the record label-backed music company Lala, which claims to have invented "Network DRM." Lala has filed for a patent on moving DRM from a file wrapper, like Windows Media and FairPlay, to the server. Digital music veteran Michael Robertson has quotes from the patent application on his blog. (Here is the application.) Lala describes an invention that monitors every access, allows only authorized devices (so far there are none), blocks downloads, and can revoke content at the labels' request. -
Happy 40th Birthday, Internet RFCs
WayHomer was one of several readers to point out the 40th birthday of an important tool in the formation of the Internet, and a look back at it by the author of the first of many. "Stephen Crocker in the New York Times writes, 'Today is an important date in the history of the Internet: the 40th anniversary of what is known as the Request for Comments (RFC).' 'RFC1 — Host Software' was published 40 years ago today, establishing a framework for documenting how networking technologies and the Internet itself work. Distribution of this memo is unlimited." -
SCO Chair's Anti-Porn Act Advances In Utah
iptables -A FORWARD writes "Gov. Jon Huntsman Jr. of Utah reportedly plans to sign a resolution urging Congress to enact the Internet Community Ports Act. The ICPA proposes that online content be divided by port, rather like TVs have channels with adult and family content, so that certain internet ports will be 'clean' — so-called Community Ports — and others will be 'dirty.' Thus, they hope to remove objectionable content from port 80 and require that it be moved elsewhere (port 666 was already taken by Doom, sorry), so that people could more easily block objectionable content, or have their ISPs do the blocking for them. This concept is being pushed by the CP80 group, which is chaired by Ralph Yarro, who also chairs the SCO Group. That probably explains why they didn't choose to adopt RFC 3514, instead." -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
MD5 To Be Considered Harmful Someday
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note." -
OpenBSD Project Announces OpenBGPD
44BSD writes "As noted at undeadly, the OpenBSD Project has announced an BSD-licensed implementation of the Border Gateway Protocol, BGP. Project details, design goals, documentation, and more are at the project web site. BGP is documented in RFC 1771. Lucky for Cisco, BSD is dying..." -
Voice Over IP Goes Global, The DNS Way
awehttam writes "A couple of geeks have setup a non-profit public DNS root designed to map phone numbers to Internet protocols. These days we're hearing lots about Skype, and Voice over IP. Asterisk - the open source PBX - is nearing its version 1.00 release, Free World Dialup has applied to run the .tel top level domain, Good old Bell's are migrating to native IP, private sector layer 2 clearing houses are exchanging bits between companies the like of Packet8, China Telecom, MIT and Harvard and even the various regulatory agencies are pondering just what to do about things. In the mean time, consumer SIP phones are dropping in price, and free and open source software is helping to drive a new generation of provide the services networks." Read on for more."You just knew the other shoe had to drop. E164.org let's people register their existing phone numbers, and aim various services including VoIP towards a URL on the Internet. Now you can have your calls sent to your Free World Dialup account, or routed to your home Asterisk PBX instead, possibly where you have a $20 card attached to your phone line letting you make and receive calls through both your regular phone line and the Internet. E164.org isn't just about VoIP though, it can also map phone numbers to Email addresses, Instant Messager URL's, or any other protocol that fits in the "foo://bar" scheme of the 'net. :)"
-
Voice Over IP On Wireless Mesh
infractor writes "ZDNet is reporting that the Linux based LocustWorld Mesh system now has SIP routing at every node. The LocustWorld boxes have been widely used in community broadband projects where DSL is not available, so successfully that they have been seen as a threat to next generation mobile networks. With the addition of VoIP support, these mesh networks can now compete with the telcos on voice as well as data services. More details here." -
Happy 35th birthday, RFC 1!
An anonymous reader writes "On April 7th, 1969, the first ever RFC was published, describing the networking technology behind the then-nascent ARPAnet. In the intervening 35 years, networking technology has come a long way, but it brings perspective to the modern Internet to reflect on how it all began." -
Implementing CIFS
Bombcar writes "Anyone who has used Microsoft products in the last ten years has used the SMB protocol (now known as CIFS). Some have become experts in the usage of Windows file sharing, Samba, and more. We know that there can be a 15 minute delay before new machines appear in 'Network Neighborhood'. We've read the Official Samba 3 book, and follow the Samba mailing list once in a while, perhaps even answering questions. But there is a limit to the knowledge given by these sources." Read on for Bombcar's review of Implementing CIFS from Prentice Hall. Implementing CIFS author Christopher R. Hertel pages 642 publisher Prentice Hall rating 8 of 10 reviewer Tom Dickson ISBN 013047116X summary In-depth (but not too deep) coverage of the CIFS/SMB protocolIt is one thing to be able to use Samba, Windows, and the Common Internet File System (CIFS) protocol. It is another thing entirely to understand CIFS with sufficient depth to begin coding using it. This is where Christopher Hertel's Implementing CIFS begins.
This thick book (over 600 pages) begins with a history of NetBIOS in the DOS era. It quickly progresses to NetBIOS over TCP/IP (which evolved into the current CIFS protocol). Hertel documents the beginnings of quirks that will last throughout the life of the protocol. There is an RFC that was proposed in 1987, but many vendors have added extensions to this. (It might surprise you to learn that Samba has added extensions, which are covered in Chapter 24).
After the basic overview, he quickly dives into real coding of an actual (though simple) implementation. This will be his style for the rest of the book (except for humorous asides now and then). An aspect of the protocol, such as Name Resolution, will be explained in some detail, and then expounded in actual code (and in a few cases pseudocode).
The detail is good but not overwhelming. Some people (with names like Jerry Carter or Andrew Tridgell) will want more depth than this book provides, but for with a protocol as varied as CIFS, choices have to be made. As the Samba website mentions, this book is written in "Geekish." The book covers aspects of older and newer SMB/CIFS implementations, including a description of the NTLM2 challenge/auth system.
One thing that should be noted is that the code examples work, but as the author points out, they usually have little or no error handling. This is common to many books, but it is something to remember.
Now, should you get this book? If you're just a user, you probably don't need it. But if you've ever wished you could understand the Samba technical mailing list, or wanted to know why it takes up to 15 minutes to see a new machine, then you'll enjoy this book. If you want to utilize CIFS in any manner (even if just implementing Samba for clients), I'd highly recommend reading this. It will help you to understand what is going on on your network, even if you're not writing the code yourself. And if you want to be a Samba coder, it is required reading.
What didn't I like? I first read the book in an airport, and found that it relies heavily on having access to a computer. I would have preferred more explanations of code fragments than was given. However, this is a minor issue; most people who are implementing CIFS will be using a computer! I was also left with a desire for more information, but the large Appendix D along with many sources recommended provide for further study.
As a bonus, Appendix A tells you how to make a good cup of Earl Grey tea! That alone to some would be worth the price of admission.
You can purchase Implementing CIFS from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Why Is Free MUD Development Lagging?
Thanks to Skotos for its editorial discussing why free, open-source MUD development is failing to advance swiftly. The author notes "The best [text-based MUD] efforts have been almost entirely closed-source... Free MUDs, by contrast, just haven't advanced very fast." He points to several possible factors, suggesting that "MUD information is indexed poorly, and many projects don't maintain a web site with even a basic description of what they're doing", and continues: "Another reason is licensing. The Diku license is poorly understood and shoddily enforced... LPMUDs aren't much better", before concluding: "There is no existing license that does for MUD servers what the GPL does for applications. That grudging spread of features has never happened for MUD servers the way it has for GPL-licensed applications and libraries." -
Top Ten Handhelds That Didn't Make It?
Decaffeinated Jedi writes "Over at GameSpy, they're running a feature looking at the top ten handhelds that never made it. Included on the list are such 'favorites' as the Atari Lynx and the more recent Nokia N-Gage, as well as commentary by the GameSpy editors on why these portables failed to set the gaming world on fire." -
Microsoft To Remove Support For http(s) auth URLs
damohasi writes "According to Microsoft Knowledge Base, MS "plans to release a software update that removes support for handling user names and passwords in HTTP and HTTP with Secure Sockets Layer (SSL) or HTTPS URLs in Microsoft Internet Explorer". Whether this will break rfc 1738 or not, it might get webspace provider in trouble who offer @-domains like the German 1und1." -
Explaining The Windows/UNIX Cultural Divide
giampy writes "Joel Spolsky writes a review-like article on the last book of Eric S. Raymond (The Art of Unix Programming). His views on the cultural differences among Windows and Unix programmers are well explained. Overall, an interesting read." Also on the topic of Windows, badriram writes "Microsoft is reorganizing the windows team, it seems the are separating the OS core development. Seems like things heading in the right direction in creating a more secure OS, and making it more business oriented. Read the article here." -
Chock Full o' NetBSD!
jschauma writes "While it's no Indigo Espresso or a VAX Bar (though, of course, there is NetBSD/sgimips and NetBSD/vax), at least you can log in on a Mr. Coffee. And while the JavaStation has been running NetBSD for a while, full support is now completely in-tree: NetBSD's Martin Husemann announced today that he has fixed all outstanding issues with JavaStation support. This means, that you can now run your JavaStation with a stock distribution of NetBSD/sparc. The JavaStation-NC is a network computer class machine built on the microSPARC-IIep processor. More information about the JavaStation can be found in the JavaStation HOWTO, Martin's email to the port-sparc mailing list and Valeriy E. Ushakov's paper 'Porting NetBSD to JavaStation-NC.'" -
The Linux Development Platform
honestpuck writes "Back before the advent of Mac OS X, my favourite (and for many years, only) development environment was one variety of Unix or another. The nicest thing about Unix was that the development environment stayed pretty much the same regardless of the variety; this stayed the same with the introduction of Linux." Honestpuck examines how true this still is (as well how accurate the chosen title is) in his review of Prentice Hall's The Linux Development Platform, below. The Linux Development Platform author Rafeeq Ur Rehman and Christopher Paul pages 320 publisher Prentice Hall PTR rating 7 reviewer Tony Williams ISBN 0130091154 summary Good guide to developer toolsThe Linux Development Platform might be better titled "The GNU Development Platform" since almost all of the tools discussed come from the FSF, and those that don't are nevertheless open source; as a result they will run on almost any Unix variety. You know that the 'Linux' in the title is almost just a marketing ploy, but we will forgive Prentice Hall and the authors. Certainly more people will buy this book to learn about using these tools under Linux than under any other *nix variety.
The book starts with a short chapter on software development per se before getting down to the nuts and bolts. It starts in the obvious spot, with editors, and quickly covers choosing an editor before taking a brief look at Emacs, Jed and VIM. The rest of the book is devoted to much less contentious issues.
As a whole, the text provides a good grounding in using gcc, make, CVS and GDB, with enough extra information on smaller tools and larger issues (such as cross-platform and embedded systems) that you will not need more than this book and, perhaps, the man pages to understand and use these tools. Of course others, have written entire volumes on each of these topics, but for most of us this book will provide the information we need.
The Linux Development Platform comes with a CD containing the source for a fair number of the tools discussed, so you can build any tools which happen to be missing on your platform, though some of the included apps are, of course, already a version or two behind.
The writing is mixed in quality: while never bad, it has a slightly heavy, technical feel to it, often a bit wordy or cumbersome. This rarely gets in the way of understanding, but it does slow you down. The topic coverage is good, moving from a beginner level right through to a good understanding of each tool discussed. More importantly, all the tools you will need are covered.
I imagine this would make an excellent companion text for any programming course: note that it doesn't provide details on any programming language, but covers everything else you need to know regarding the development tools. It is thinnest in the discussion of editors, really only giving a brief overview of each. I cannot really see this as a fault since detailed coverage really would take a separate book, and this quick look is better than pretending to cover the topic well and failing. The other possible weakness is that there is almost no coverage of general Linux usage, so calling the book The Linux Development Platform is a bit of a misnomer -- it is really devoted to the tools available for development, not the underlying operating system at all. Once again, I feel that this lack is not serious; most buyers should know enough about the operating system and any attempt to cover it adequately would have swelled the size and cost of the book.
Prentice Hall PTR have a site for the book with a Table of Contents or you can see the whole book in HTML format at FAQs.org.
I would recommend this book to anyone who would like a good, general introduction to developing software on a Unix platform. Though it's not a cheap book, it is a good one. It was certainly a relief for me to find a good book in Prentice Hall's 'Bruce Peren Open Source Series' after a couple of flawed ones.
You can purchase The Linux Development Platform from bn.com. Slashdot welcomes readers' book reviews -- to submit a review for consideration, read the book review guidelines, then visit the submission page. -
Internationalized Domain Names Coming Soon
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)" -
Defense and Detection Against Internet Worms
Rathumos writes "The network security world has been waiting patiently for a definitive study of internet worms and defenses against them. Defense and Detection Strategies against Internet Worms by Dr. Jose Nazario has arrived to fill that space with a clear and concise analysis of the current state of worm defense." Read on for the rest of Rathumos' review. Defense and Detection Strategies against Internet Worms author Jose Nazario pages 322 publisher Artech House rating 10 reviewer Duncan Lowne ISBN 1580535372 summary This book provides a solid approach toward detection and mitigation of worm-based attacks.Publishing a book on a subject as dynamic as internet worms can never result in a complete volume. The near-weekly outbreaks of modified versions of old worms and completely new designs is enough to frustrate the efforts of even the most prolific anti-virus software developers, let alone those who try to provide an overview of their study.
Nevertheless, Nazario accomplishes a clear and concise summary of the state of worms today. Seeded by a paper ('The Future of Internet Worms', Nazario, Anderson, Connelly, Wash) written in 2001, Defense and Detection Strategies against Internet Worms encourages the reader to focus on the directions worm development might take in the future, with a specific view toward anticipation of, and prepartion for, future attacks.
The book begins with a discussion of the departure worms take from traditional computer virii. An outline of the benefits for the black-hat toward a worm-based attack, as well as a brief analysis of the threat model posed by worms, provide ample reason for the computer security professional to take the study of internet worms very seriously.
Beyond this introduction, the book is laid out in four major sections. The first introduces to the reader some background information crucial to the study of worms. The author discusses the history and taxonomy of past worm outbreaks, from their sci-fi origins (think John Brunner's Shockwave Rider) through modern-day outbreaks. A thorough analysis of various worms' traffic patterns is presented, with data broken down by infection rates, number of infected hosts, and number of sources probing specific subnets. Finally, the construction and lifecycle of worms are presented, with particular attention paid to the interaction between the worms' propagation techniques and the progression of their lifecycles.
The second section of the book (ch. 6 - 8) studies the trends exhibited by past worm outbreaks. Beginning with an examination of the processes and mechanisms of infection, it progresses on to a survey of the network topologies generated by a worm's distribution. Specific infection patterns are examined, along with case studies of worm outbreaks that have exhibited such patterns. Further, this section examines the common characteristics of vulnerable targets, from older UNIX and VMS mainframes through desktop systems onward to infrastructure equipment and embedded systems. A discussion of the payload transmission methods that have made recent worm attacks so devastatingly effective, and an explaination of why liberal use of a clue-hammer on users is not by itself enough to control and prevent further outbreaks, complement chapter nine's analysis and speculation of the future of internet worms.
Section three (ch. 9 - 11) focuses on worm detection strategies, and is more distinctly aimed at the already-overworked network security professional. Effective methods of detecting scans and analyzing a worm's scan engine are presented with a focus on timely and efficient protection from further infection. Monitoring techniques for quickly recognizing, analyzing and responding to worm outbreaks leads into a detailed description of well-placed honeypots and dark network monitors ("black holes"). Discussion of the (so-far) most effective method of worm detection, signature analysis, completes the section, and covers host-based and logfile signatures, along with a brief overview of analyzing logfiles using commonly available utilities.
The final section of the book (ch. 12 - 16), per the book's namesake, aims at defense strategies against worm outbreaks. Beginning with the obvious first steps which anyone reading the book ought to have implemented (firewalls, virus detection software, sandboxing, and patching-patching-patching), the section progresses into less widely used but equally important proxy-based defense methods, and continues on to cover slowing down infection rates and fighting back against existing worm networks. For the sake of thoroughness, an overview of the legal implications of attacking worm nodes receives its fair share of attention simply to alert the reader of the potential pitfalls of proactive defense.
Defense and Detection Strategies against Internet Worms is decidedly aimed at the experienced network security professional, but holds a much broader appeal than most technical books. With its thorough historical analysis of worm progression over the past thirty years, anyone with even a remote interest in the past, present or future of the only network security issues to consistently make headlines in the mainstream press will find this both an entertaining and enlightening read. Overall, it makes a valuable addition to any geek's bookshelf.
You can purchase Defense and Detection Strategies against Internet Worms from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
The "Techie" Vote?
Ironica writes "This Los Angeles Times article discusses a compelling trend: techies are making their collective voice heard in politics. Quote from the article: "After years as political agnostics, the programmers and engineers who orchestrated the technological revolution of the 1990s are trying to reboot government...They have money, earned during the boom. They have time, found since the bust. And they are using their technological savvy to recruit even casual Internet users to their causes." Perhaps instead of "boxers or briefs," our next presidential candidate will have to answer "POP3 or IMAP?"" -
Mathematica vs. Matlab?
Ninnux asks: "I wanted to find out from the community which was the better mathametics modeling package: Mathematica or Matlab. The cancer center I research and program for is considering purchasing a license set. I'll be working with Bayesian machine learning and other bioinformatic approaches for hormone pathway modeling. I know Matlab has various toolboxes that would be rather useful, but I'd like to hear what people think." While I'm sure direct comparisons will be made, I think focusing on the specific niche will help Ninnux the most; so, how well does each piece of software handle Bayesian functions and other bioinformatic computations? -
The Myth of Open Source Security Revisited v2.0
Dare Obasanjo contributed this followup to an article entitled The Myth of Open Source Security Revisited that appeared on the website kuro5hin. He writes: "The original article tackled the common misconception amongst users of Open Source Software(OSS) that OSS is a panacea when it comes to creating secure software. The article presented anecdotal evidence taken from an article written by John Viega, the original author of GNU Mailman, to illustrate its point. This article follows up the anecdotal evidence presented in the original paper by providing an analysis of similar software applications, their development methodology and the frequency of the discovery of security vulnerabilities." Read on below for his detailed analysis, especially relevant with the currency of security initiatives in the worlds of both open- and closed-source software.
The Myth of Open Source Security Revisited v2.0 The purpose of this article is to expose the fallacy of the belief in the "inherent security" of Open Source software and instead point to a truer means of ensuring the quality of the security of a piece software is high.
Apples, Oranges, Penguins and Daemons
When performing experiments to confirm a hypothesis on the effect of a particular variable on an event or observable occurence, it is common practice to utilize control groups. In an attempt to establish cause and effect in such experiments, one tries to hold all variables that may affect the outcome constant except for the variable that the experiment is interested in. Comparisons of the security of software created by Open Source processes and software produced in a proprietary manner have typically involved several variables besides development methodology.
A number of articles have been written that compare the security of Open Source development to proprietary development by comparing security vulnerabilities in Microsoft products to those in Open Source products. Noted Open Source pundit, Eric Raymond wrote an article on NewsForge where he compares Microsoft Windows and IIS to Linux, BSD and Apache. In the article, Eric Raymond states that Open Source development implies that "security holes will be infrequent, the compromises they cause will be relatively minor, and fixes will be rapidly developed and deployed." However, upon investigation it is disputable that Linux distributions have less frequent or more minor security vulnerabilities when compared to recent versions of Windows. In fact the belief in the inherent security of Open Source software over proprietary software seems to be the product of a single comparison, Apache versus Microsoft IIS.
There are a number of variables involved when one compares the security of software such as Microsoft Windows operating systems to Open Source UNIX-like operating systems including the disparity in their market share, the requirements and dispensations of their user base, and the differences in system design. To better compare the impact of source code licensing on the security of the software, it is wise to reduce the number of variables that will skew the conclusion. To this effect it is best to compare software with similar system design and user base than comparing software applications that are significantly distinct. The following section analyzes the frequency of the discovery of security vulnerabilities in UNIX-like operating systems including HP-UX, FreeBSD, RedHat Linux, OpenBSD, Solaris, Mandrake Linux, AIX and Debian GNU/Linux.
Security Vulnerability Face-Off
Below is a listing of UNIX and UNIX-like operating systems with the number of security vulnerabilities that were discovered in them in 2001 according to the Security Focus Vulnerability Archive. AIX 10 vulnerabilities[6 remote, 3 local, 1 both] Debian GNU/Linux 13 vulnerabilities[1 remote, 12 local] + 1 Linux kernel vulnerability[1 local] FreeBSD 24 vulnerabilities[12 remote, 9 local, 3 both] HP-UX 25 vulnerabilities[12 remote, 12 local, 1 both] Mandrake Linux 17 vulnerabilities[5 remote, 12 local] + 12 Linux kernel vulnerabilities[5 remote, 7 local] OpenBSD 13 vulnerabilities[7 remote, 5 local, 1 both] Red Hat Linux 28 vulnerabilities[5 remote, 22 local, 1 unknown] + 12 Linux kernel vulnerabilities[6 remote, 6 local] Solaris 38 vulnerabilities[14 remote, 22 local, 2 both] From the above listing one can infer that source licensing is not a primary factor in determining how prone to security flaws a software application will be. Specifically proprietary and Open Source UNIX family operating systems are represented on both the high and low ends of the frequency distribution.
Factors that have been known to influence the security and quality of a software application are practices such as code auditing (peer review), security-minded architecture design, strict software development practices that restrict certain dangerous programming constructs (e.g. using the str* or scanf* family of functions in C) and validation & verification of the design and implementation of the software. Also reducing the focus on deadlines and only shipping when the system the system is in a satisfactory state is important.
Both the Debian and OpenBSD projects exhibit many of the aforementioned characteristics which help explain why they are the Open Source UNIX operating systems with the best security record. Debian's track record is particularly impressive when one realizes that the Debian Potato consists of over 55 million lines of code (compared to RedHat's 30,000,000 lines of code).
The Road To Secure Software
Exploitable security vulnerabilities in a software application are typically evidence of bugs in the design or implementation of the application. Thus the process of writing secure software is an extension of the process behind writing robust, high quality software. Over the years a number of methodolgies have been developed to tackle the problem of producing high quality software in a repeatable manner within time and budgetary constraints. The most successful methodologies have typically involved using the following software quality assurance, validation and verification techniques; formal methods, code audits, design reviews, extensive testing and codified best practices.-
Formal Methods: One can use formal proofs based on mathematical
methods and rigor to verify the correctness of software algorithms. Tools
for specifying software using formal techniques exist such as VDM and Z.
Z (pronounced 'zed') is a formal specification notation based on set
theory and first order predicate logic. VDM stands for "The Vienna
Development Method" which consists of a specification language called
VDM-SL, rules for data and operation refinement which allow one to
establish links between abstract requirements specifications and
detailed design specifications down to the level of code, and a proof
theory in which rigorous arguments can be conducted about the properties
of specified systems and the correctness of design decisions.The
previous descriptions were taken from the
Z FAQ and the
VDM FAQ
respectively. A comparison of both specification languages is
available in the paper,
Understanding the differences between VDM and Z
by I.J. Hayes et al.
-
Code Audits: Reviews of source code by developers other than the
author of the code are good ways to catch errors that may have been
overlooked by the original developer. Source code audits can vary from
informal reviews with little structure to formal code inspections or
walkthroughs. Informal reviews typically involve the developer sending
the reviewers source code or descriptions of the software for feedback
on any bugs or design issues. A walkthrough involves the detailed
examination of the source code of the software in question by one or more
reviewers. An inspection is a formal process where a detailed examination
of the source code is directed by reviewers who act in certain roles. A
code inspection is directed by a "moderator", the source code is read by a
"reader" and issues are documented by a "scribe".
-
Testing: The purpose of testing is to find failures. Unfortunately,
no known software testing method can discover all possible failures that
may occur in a faulty application and metrics to establish such details
have not been forthcoming. Thus a correlation between the quality of a
software application and the amount of testing it has endured is
practically non-existent.
There are various categories of tests including unit, component, system, integration, regression, black-box, and white-box tests. There is some overlap in the aforementioned mentioned testing categories.
Unit testing involves testing small pieces of functionality of the application such as methods, functions or subroutines. In unit testing it is usual for other components that the software unit interacts with to be replaced with stubs or dummy methods. Component tests are similar to unit tests with the exception that dummmy and stub methods are replaced with the actual working versions. Integration testing involves testing related components that communicate with each other while system tests involve testing the entire system after it has been built. System testing is necessary even if extensive unit or component testing has occured because it is possible for seperate subroutines to work individually but fail when invoked sequentialy due to side effects or some error in programmer logic. Regression testing involves the process of ensuring that modifications to a software module, component or system have not introduced errors into the software. A lack of sufficient regression testing is one of the reasons why certain software patches break components that worked prior to installation of the patch.
Black-box testing also called functional testing or specification testing test the behavior of the component or system without requiring knowledge of the internal structure of the software. Black-box testing is typically used to test that software meets its functional requirements. White-box testing also called structural or clear-box testing involves tests that utilize knowledge of the internal structure of the software. White-box testing is useful in ensuring that certain statements in the program are excercised and errors discovered. The existence of code coverage tools aid in discovering what percentages of a system are being excercised by the tests.
More information on testing can be found at the comp.software.testing FAQ .
-
Design Reviews: The architecture of a software application can be
reviewed in a formal process called a design review. In design reviews the
developers, domain experts and users examine that the design of the
system meets the requirements and that it contains no significant flaws
of omission or commission before implementation occurs.
-
Codified Best Practices: Some programming languages have libraries
or language features that are prone to abuse and are thus prohibited in
certain disciplined software projects. Functions like
strcpy,gets, andscanfin C are examples of library functions that are poorly designed and allow malicious individuals to use buffer overflows or format string attacks to exploit the security vulnerabilities exposed by using these functions. A number of platforms explicitly disallowgetsespecially since alternatives exist. Programming guidelines for such as those written by Peter Galvin in a Unix Insider article on designing secure software are used by development teams to reduce the likelihood of security vulnerabilities in software applications.
Issues Preventing Development of Secure Open Source Software
One of the assumptions that is typically made about Open Source software is that the availability of source code translates to "peer review" of the software application. However, the anecdotal experience of a number of Open Source developers including John Viega belies this assumption.
The term "peer review" implies an extensive review of the source code of an application by competent parties. Many Open Source projects do not get peer reviewed for a number of reasons including- complexity of code in addition to a lack of documentation makes it
difficult for casual users to understand the code enough to give a
proper review
- developers making improvements to the application typically focus
only on the parts of the application that will affect the feature to be
added instead of the whole system.
- ignorance of developers to security concerns.
- complacency in the belief that since the source is available that
it is being reviewed by others.
Benefits of Open Source to Security-Conscious Users
Despite the fact that source licensing and source code availability are not indicators of the security of a software application, there is still a significant benefit of Open Source to some users concerned about security. Open Source allows experts to audit their software options before making a choice and also in some cases to make improvements without waiting for fixes from the vendor or source code maintainer.
One should note that there are constraints on the feasibility of users auditing the software based on the complexity and size of the code base. For instance, it is unlikely that a user who wants to make a choice of using Linux as a web server for a personal homepage will scrutinize the TCP/IP stack code.
References- Frankl, Phylis et al. Choosing a Testing Method to Deliver
Reliability. Proceedings of the 19th International Conference on
Software Engineering, pp. 68--78, ACM Press, May 1997.
<
http://citeseer.nj.nec.com/frankl97choosing.html
>
- Hamlet, Dick. Software Quality, Software Process, and
Software Testing. 1994. <
http://citeseer.nj.nec.com/hamlet94software.html
>
-
Hayes, I.J., C.B. Jones and J.E. Nicholls. Understanding the
differences between VDM and Z. Technical Report UMCS-93-8-1,
University of Manchester, Computer Science Dept., 1993.
<
http://citeseer.nj.nec.com/hayes93understanding.ht ml >
-
Miller, Todd C. and Theo De Raadt. strlcpy and strlcat - consistent,
safe, string copy and concatenation. Proceedings of the 1999 USENIX
Annual Technical Conference, FREENIX Track, June 1999.
<
http://www.usenix.org/events/usenix99/full_papers/ millert/millert_html/
>
-
Viega, John. The Myth of Open Source Security. Earthweb.com.
<
http://www.earthweb.com/article/0,,10455_626641,00 .html >
- Gonzalez-Barona, Jesus M. et al. Counting Potatoes: The Size of
Debian 2.2. <
http://people.debian.org/~jgb/debian-counting/coun ting-potatoes/
>
-
Wheeler, David A. More Than A Gigabuck: Estimating GNU/Linux's Size.
<
http://www.counterpane.com/crypto-gram-0003.html
>
Acknowledgements
The following people helped in proofreading this article and/or offering suggestions about content: Jon Beckham, Graham Keith Coleman, Chris Bradfield, and David Dagon. © 2002 Dare Obasanjo -
Formal Methods: One can use formal proofs based on mathematical
methods and rigor to verify the correctness of software algorithms. Tools
for specifying software using formal techniques exist such as VDM and Z.
Z (pronounced 'zed') is a formal specification notation based on set
theory and first order predicate logic. VDM stands for "The Vienna
Development Method" which consists of a specification language called
VDM-SL, rules for data and operation refinement which allow one to
establish links between abstract requirements specifications and
detailed design specifications down to the level of code, and a proof
theory in which rigorous arguments can be conducted about the properties
of specified systems and the correctness of design decisions.The
previous descriptions were taken from the
Z FAQ and the
VDM FAQ
respectively. A comparison of both specification languages is
available in the paper,
Understanding the differences between VDM and Z
by I.J. Hayes et al.
-
Controlling tha Noise?
Quite a few submitters have asked "How do you make a quiet PC". Well, rather than tackle it from the PC standpoint, how about devices that can quiet a whole environment? Along these lines, 16977 asks: "I've been considering building an active noise control system for an area about the size of a closet. ANC today doesn't work quite as well as it did in Silence, Please (works best for low frequencies, only covers small areas, etc.), but it is still a fascinating technology. I'm wondering if anyone out there has done similar projects with either the hardware or controlling software of ANC, and what information they have to share." And since I have your attention on this subject, sammy.lost-angel.com asks: "I would like to ask the slashdot community about their recommendations for noise-cancelling headphones. Traveling in planes is very noisy, and with MP3 players becoming more and more common, I would like to hear some experiences with various different noise cancelling headphones paired up with MP3 players. How well do they work in general? What is the best and most cost-effective headphones available?" -
Beyond The Cell -- Journalists' Video Phone
dimitri_k writes: "This article from poynter.org gives some information about the video phone that has become standard in reporting recently. It uses H.263 for compression, and a satellite phone to call into ISDN lines. Maybe people on Slashdot can brainstorm ways to increase the bandwidth of these things in the short term (i.e. cost-ineffective combination of lines) so that the cable news networks can turn the grainy, live, night-vision shots in Afghanistan clear." This setup looks a little chunky, but when you consider the capability to beam video information from anywhere in the world, it's very impressive. -
Communicating Via Space Dust
klieber writes: "Never mind IP over Avian Carriers, here's a company that really sends data over space dust. Transfer rates are up to 20Kbps but because the data is transferred in short, sporadic bursts, the usable transfer rate is more like 9600bps. (And it uses a proprietary protocol, not IP) The technology, called "Meteor Burst," has been around since the 1930's and was apparently first developed for the U.S. Military. It's now being targeted at vehicle positioning systems and is being used by a private ambulance company in the Pacific Northwest." This is ... really strange. -
Mail User Agent Comparisons?
tjgoodwin asks: "I'm the SysAdmin in an astronomy department. Our currently supported mail user agent is pine [?] , but I'm looking at alternatives. I'm particularly interested in strong support for qmail's maildirs. I need to support at least one text-based UA: mutt [?] does what I need with maildirs, but is it really suitable for a user base, many of whom are new to Unix? I'm also considering graphical UAs, preferably gnome-based. I've failed to find any useful comparison information (the UNIX Email Software Survey FAQ is seriously out of date). Any pointers?" -
80 Proof Quickies
Lets start this off with some homework: we were nominated for a 2000 Webby in Community. Please go vote for us (requires annoying login, but please do it anyway! I want a crappy little trophy!) Now with the 'biz outta the way, brainsik pointed us to the Brainshaker: a headmounted subwoofer that looks like it would make Quake a bit to real. Plastik noted a web filter guaranteed to offend the conservative and humorless. But it makes reading Slashdot damn entertaining. And if you're interesting in violating most religions, vkulkarn found an "Escort" who apparently reads Slashdot (will she go out with CowboyNeal?) Speaking of religion, Zippy noted that I am apparently a prophet in the Church of The Enlightenment , along with Jay Stile of Stileproject . Illiad, from Userfriendly.org is a bard. webword sent us CalculusGirls.com which combines 2 of the many things I don't understand. Andy Lester noted that Brunching Shuttlecocks has a book on "Fuzzy Logic Functions", in the style of O'Reilly. yek401 noted that his english professor builds barbie doll cyborgs: god bless tenure ;) Trenchcoat Steve warned us about Moon Land Registry which claims to be selling land on the moon for $10/acre: you even get a deed and mineral rights... and it might be legal! Gravey noted that their are two new Reboot movies going into production. For you conspiracy theorists, backtick noted that everyone's favorite software monopoly might be getting into the furniture biz along with Lazyboy. SgtPepper pointed us to RFC 2795 which "describes a protocol suite which supports an infinite number of monkeys that sit at an infinite number of typewriters" ucsimon noted that LegoLand in California just gota liquor license. Mind you after a few shots of vodka, finding a 2x2 blue block takes a lot longer. Let's wrap up with jyuter's note that Comedy Central has vid clips of the south park kids doing Python's parrot sketch in Quicktime or Real. -
Jean-loup Gailly On gzip, go, And Mandrake
Jean-loup is the kind of person I love to see us interview here. He's important in the sense that work he's done (positively) affects almost every Linux or Unix user, but the chance of Jean-loup ever getting any "mainstream" media attention is zero. Or possibly less. Without people like Jean-loup there would be no Open Source movement, and I consider the chance to present him as a Slashdot interview guest a *huge* honor. The readers who asked the excellent questions, and the moderators who helped select them, also deserve major kudos. So thanks to all of you for an excellent Q&A session!1) bzip2 Support
by Aaron M. RennWhen is gzip going to provide (transparent) support for bzip2 files and the Burrows-Wheeler algorithm?
Will BW be an algorithm option within the gzip file format itself ever?
Gailly:
I have worked very closely with Julian Seward, the author of bzip and bzip2. The goal was to integrate a Burrows-Wheeler algorithm inside zlib 2.0 (upon which gzip 2.0 is based). One of the requirements was to avoid the kind of arithmetic coding used in bzip because of both patent and decoding speed concerns, so Julian wrote the Huffman coding code now used in bzip2. Another requirement was to put the code in library form and Julian did that too.
Unfortunately, Julian decided to release bzip2 independently instead of staying within the gzip 2.0 project. It was mainly my fault, since I couldn't spend enough time on the other parts of the project, and the project was not advancing fast enough. Since Julian left, the project progressed even more slowly, and new blood is obviously necessary, because other responsibilities no longer leave me enough time for gzip. If you're an expert in data compression, e-mail me to convince me that you are the most qualified person to turn the zlib/gzip 2.0 project into an overwhelming success :-)
2) The Data Compression Book
by druddI am a happy owner of The Data Compression Book (2nd Ed). With the increasing availability of compression routines within libraries (Java's GZIP streams spring to mind), does this make your book a little unnecessary?
Should software authors continue to write their own compression routines, or simply trust the versions available to them in library form?
I can see some definite advantages to library code, i.e. the ability to upgrade routines, and having standardized algorithms which can be read by any program which utilizes the library.
Gailly:
The compression routines in The Data Compression Book were written mainly for clarity, not for efficiency. The source code is present to help understand how the compression algorithms work. It is not designed to be used as is in other software packages, although it does work if efficiency is not a concern. Consider the book as teaching material, not as a data compression library distributed in printed form.
This doesn't mean that the book is unnecessary. Good data compression libraries don't appear magically; their authors had to learn compression techniques one day. If the book helps one person to get started in the data compression area and this person later writes a great compression library, the book will have been useful.
Judging by the success of my zlib data compression library, I think that a vast majority of software authors prefer using an existing library rather than reinventing the wheel. This is how the open-source model works: building upon the work of others is far more efficient than rewriting everything.
3) Compression patents
by StephenThe compression world has many patents, notably for Lempel-Ziv compression as used in GIF. What is your view on companies patenting non-obvious algorithms for such processes as data compression?
Gailly:
The worst problem is companies patenting obvious algorithms. There are far more patents on obvious ideas than patents on really innovative ideas. In the data compression area, even something as basic as run-length encoding (replace "aaaaa" with a special code indicating repeat "a" 5 times) has been patented at a time where this technique had been well known and widely used for many years.
It is distressing to see the U.S. patent office granting such patents, in contradiction with the law requiring an idea to be both novel and non-obvious to be patentable. Philip Karn has made a good analysis of the problem.
Patents on non-obvious algorithms are a different matter. One view is that algorithms should not be patentable at all, whether obvious or not. This used to be the case, until the US patent office started to grant patents on methods which were nothing else than pure algorithms. I'm afraid that a switch back to the original situation is extremely unlikely.
Several reforms are necessary:
- The patent term should be significantly shortened, at least for algorithms. The patent system was designed to benefit society as a whole, ensuring that new ideas would eventually be made public after a limited period of time instead of being kept as trade secrets. But 20 years is incredibly long in the software area. Granting a monopoly for such a long time no longer benefits society.
- The non-obviousness requirement should be applied much more strictly. A little bit of common sense would avoid a lot of patents on trivial ideas.
- Prior art should be checked more thoroughly. Even non-obvious ideas should not be patented if they have been in use for several years already.
4) A question about Mandrake...
by Mr. PenguinAs we all know, at first Mandrake was little more than a repackaged version of Red Hat. That's changed a bit with the newer versions. My question is this: to what degree will Mandrake continue to differ from RedHat and will there ever be a "developer" version (i.e. one that is centered towards those who are a bit more technically competant)?
Gailly:
That's changed more than a bit. Our distribution is now completely made by us. Believe me, doing everything ourselves represents a significant amount of work. Few people understand how much work is involved in making an independent distribution. We have our own development teams producing things like our graphical install DrakX, our disk partionner DiskDrake, management of security levels in msec, hardware detection with Lothar, etc... Our packages are more recent than those of Red Hat and have more functionality (such as supermount support in the kernel). Red Hat is now even copying packages made by MandrakeSoft (e.g. rpmlint). I hate having to speak like a salesman here, but it is really unfair to say that Mandrake just repackages RedHat; this is simply not true anymore.
Have you looked at Linux-Mandrake 7.0? It does include a developer version. At install time, select the option "Custom" then "Development". You will get all necessary development tools. We, as developers, use our own distribution :-)
5)Why is Mandrake better than Red Hat?
I guess that you have at least a little something to say about this.
Is the 586 optimization enough to justify Mandrake's position? Are you especially proud of any of the architectural differences between the distributions (from what I have been told, the Apache-PHP layout is quite a bit different).
How do feel about the steps that Red Hat has taken to change their distribution in reaction to yours?
Gailly:
Mandrake is far more than Red Hat plus 586 optimization. It is an independent distribution. (See the answer to A question about Mandrake above.) We have enhanced some packages (such as the kernel or Apache) to provide additional functionality for users.
It's clear that Mandrake pushes Red Hat to improve its own version and nowdays Red Hat includes some development from Mandrakesoft. There is a coopetition: Red Hat and MandrakeSoft both benefit from the same open-source community, but they compete for the customer. This coopetition is fully beneficial for the Linux users since we both need to constantly improve our version. We just make sure that Mandrake stays ahead :-)
6)Winzip
by UrukI noticed that you allowed the people who make the Winzip product to incorporate code written for Gzip. I think it's cool that you did that, because it would be horrible if winzip couldn't handle the gzip format, but at the same time, what are your thoughts about allowing free software code to be included in closed-source products?
Just out of curiosity, (tell me it's none of my business if you want to and I'll be OK with that) did you receive a licensure fee from the company that makes Winzip for the code?
Gailly:
I started writing compression code simply because my 20 MB hard disk, the biggest size one could get at the time, was always full. I didn't write it for money. Even after I got a bigger hard disk, I continued writing compression code for fun. In particular I was not interested in writing a Windows interface. This is why I allowed my code to be used in Winzip. I received exactly 0$ for this.
The zlib license also allows it to be used in closed-source products. This was an absolute requirement for the success of the PNG image format, which relies on zlib for data compression. If we had used a GPL license, Netscape and Microsoft Explorer wouldn't support PNG, and the PNG format would be dead by now. I also received 0$ for zlib, if you're curious...
Even though I allowed my code to be used in closed-source products, I am a strong supporter of the open-source model. That's also why I work for MandrakeSoft. The open-source model is getting so much momentum that it will in the end dominate the software industry.
7) What about wavelets? by Tom Womack
The Data Compression Book was an excellent reference when it came out, but there are some hot topics in compression that it doesn't cover - frequency-domain lossy audio techniques (MP3), video techniques (MPEG2 and especially MPEG4), wavelets (Sorenson video uses these, I believe, and JPEG2000 will), and the Burrows-Wheeler transform from bzip.
Do you have any plans for a new edition of the book, or good Web references for these techniques? BZip is covered well by a Digital research note, but documentation for MPEG2 seems only to exist as source code and I can't find anything concrete about using wavelets for compression. The data is all there on the comp.compression FAQ, but the excellent exposition of the book is sorely lacking.
Gailly:
These are all very worthy topics, and Mark Nelson and I would like to incorporate them into a new version of the book someday. However, the decision to produce a new version is taken by the publisher, not us.
Note also that these are all very big topics, and it would be quite easy to write an entire book on each one. I don't think they will fit well in a chapter or two. Covering JPEG in one chapter was difficult, and Mark Nelson has been criticized for not describing the specifics of the standard algorithm.
You can find some Web references here and there, in addition to the comp.compression FAQ.
8) Compression software
by jdIt is a "truism" in the Free Software community that code should be released early and released often.
However, much of the software you've written has started gathering a few grey hairs. Gzip, for example, has been at 1.2.4 for many, many moons, and looks about ready to collect it's gold watch.
Is compression software in a category that inherently plateus quickly, so that significant further work simply isn't possible? Or is there some other reason, such as Real Life(tm) intruding and preventing any substantial development?
(I noticed, for example, a patch for >4Gb files for gzip, which could have been rolled into the master sources to make a 1.2.5. This hasn't happened.)
Gailly:
I knew this question would come when I accepted a Slashdot interview. But I had to face it :-(
In short, you are completely right. While working on gzip 2.0, I continued to maintain gzip 1.x, accumulating small patches, and answering a lot of e-mail. But I was hoping to be able to release gzip 2.0 directly, without having to make an intermediate 1.x release. See my answer to the question bzip2 support concerning the state of gzip 2.0 and the Real Life interference. I'd be glad to hand over all my patches for 1.2.4 to the person who can help me getting the gzip 2.0 project to full speed.
9) Proprietary algorithms
by Tom WomackThe field of compression has been thronged with patents for a long time - but patents at least reveal the algorithm.
What do you think of the expansion of trade-secret algorithms (MP3 quantisation tables, Sorensen, RealAudio and RealVideo, Microsoft Streaming Media) where the format of the data stream is not documented anywhere?
Gailly:
The hardware specifications for some video cards were kept as trade secrets. As a result, the XFree86 project couldn't support these cards. Increasing pressure from users who didn't buy those cards because they couldn't be supported has led the manufacturers to release the hardware specifications, and those cards are now well supported.
Similarly, I think that pressure from the open-source community can become strong enough to force companies to open their formats. We're not completely there yet, but I believe that the open-source model will win in the end. Even a giant like Microsoft starts considering Linux as a real threat.
10) Go and Compression
by InquisiterWhen I think of a game like go or chess, I think that each player develops there own algorithm to beat their opponent. If you agree, what relationships or similarities do you see between your intrest in Go and your intrest in compression?
Gailly:
What a nice question!
Even though the rules of go are very simple, the complexity of go is astonishing. The best go programs can be beaten by a human beginner. The search space in go is so large that is impossible to apply the techniques that are so successful in chess. Professional go players never evaluate all possible moves. They are able to compress an enormous amount of information into a relatively small number of concepts.
Where a human beginner would have to painfully examine many possibilities to realize that a certain group is doomed, and would most likely fail in the process, a go expert can immediately recognize certain shapes and can very quickly determine the status of a group. One gets stronger at the game by reaching higher levels of abstraction, which are in effect better compression ratios. A professional go player can elaborate concepts that an average player would have great difficulties to understand.
Current go programs are overwhelmed by the amount of information present in a game of go. They are unable to understand what is really going on. Since brute force techniques can't work in go, programs will only improve by compressing the available information down to a manageable level.