Denial of Service via Algorithmic Complexity
dss902 writes "We (Department of Computer Science, Rice University) present a new class of low-bandwidth denial of service attacks that exploit algorithmic deficiencies in many common applications' data structures... Using bandwidth less than a typical dialup modem, we can bring a dedicated Bro server to its knees; after six minutes of carefully chosen packets, our Bro server was dropping as much as 71% of its traffic and consuming all of its CPU. We show how modern universal hashing techniques can yield performance comparable to commonplace hash functions while being provably secure against these attacks."
Several years ago Apache had a function that ran in O(n^2) time, and a HTTP GET request was capable of throwing that function into taking a very long time to process, making it easy to DoS an Apache httpd. The interesting part is that the same function could have been done in O(n) time, but wasnt. The Apache team fixed this by substituting the O(n) algorithm.
The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
The issue is that even the best hash functions perform worse under very specific circumstances, i.e. high numbers of hash collisions. An earlier poster mentioned quicksort; the quicksort algorithm, while being the most (or one of the most) efficient sorting algorithms, is extremely inefficient if given an ordered set already. Obviously, a check can be added to make sure you are not sorting an already sorted algorithm.
Presumably, the same could be done for hash functions, in which, by having a bunch of different hash functions available, the software could revert to an alternate if the primary function appeared to be generating way too many collisions. Just because people hadn't thought about this much or designed software to cope doesn't mean it would be very hard to prevent against.
I found your paper very interesting. I'd like to address a couple things.
1. You point out that MD5 is vulnerable to a brute-force search for bucket collisions. Isn't any deterministic hash-function vulnerable to the same attack? I know you solved the problem using a keyed version of MD5. With Carter and Wegman, you alluded to a randomly chosen constant and vector. I didn't notice you addressing this issue with UMAC.
2. The abstract says, "We show how modern universal hashing techniques can yield performance comparable to commonplace hash functions while being provably secure against these attacks." The abstract is ambiguous, but it insinuates that you'll provide a proof. I didn't see one. Perhaps it was in the references? Even if it was, certainly your 20 bit Carter-Wegman construction merits a new proof.
3. You said, "When analyzing algorithmic complexity attacks, we must assume the attacker has access to the source code of the application," I disagree. The attacker doesn't need the source code. They can reverse engineer the compiled binary.
Also, I wonder if an attacker even needs the program. An attacker could reasonably guess that a Perl script will store a certain string in an associative array. Many websites automatically process their Apache logs with Perl. Instead of requesting real pages, a blackhat could request strings that will hash to the same bucket (assuming the site uses Perl). When cron starts processing the logs, the website could slow to a crawl. Granted, the attacker knows the hash function from Perl, but they don't have access to the website's custom-made script.
4. You have a spelling error. Your paper should read, "due to the ten times larger queues of unprocessed events".