Slashdot Mirror


Tracing the Limits of Computation

An anonymous reader writes: For more than 40 years, researchers had been trying to find a better way to compare two arbitrary strings of characters, such as the long strings of chemical letters within DNA molecules. The most widely used algorithm is slow and not all that clever: It proceeds step-by-step down the two lists, comparing values at each step. If a better method to calculate this "edit distance" could be found, researchers would be able to quickly compare full genomes or large data sets, and computer scientists would have a powerful new tool with which they could attempt to solve additional problems in the field.

Yet in a paper presented at the ACM Symposium on Theory of Computing, two researchers from the Massachusetts Institute of Technology put forth a mathematical proof that the current best algorithm was "optimal" — in other words, that finding a more efficient way to compute edit distance was mathematically impossible. But researchers aren't quite ready to record the time of death. One significant loophole remains. The impossibility result is only true if another, famously unproven statement called the strong exponential time hypothesis (SETH) is also true.

2 of 82 comments (clear)

  1. This reminds me... by Dutch+Gun · · Score: 5, Interesting

    This article reminded me of learning about string-matching algorithms as a student. When you first naively implement a string-matching algorithm as a student yourself (sans prior research), you'd generally start with what I'd call the string-matching equivalent of a bubble sort - it works, but it's hopelessly inefficient, as you begin a simple search with each new character in the source text, backtracking if the match doesn't occur, and checking the next potential match.

    When you try to get a bit more clever, you realize there's probably a way to ensure that you can get examine each character in the searched text once and only once, never having to backtrack at all, and you get something like the Knuth–Morris–Pratt algorithm.

    However, it takes a fairly impressive leap of intuition / invention to get to the Boyer–Moore string search algorithm, which actually allows you to skip over checking some characters entirely, which I would have intuitively thought impossible without making that mental leap. Learning about these impressive algorithms was one of my favorite part of the Computer Science curriculum when I was in school.

    It will be interesting to see if someone eventually breaks through the current state-of-the-art limits of string comparison in our lifetime. It would be a bit sad if the hypothetical maximum efficiency were already reached, as was predicted by these mathematical models, because the current best algorithms still require a lot of computation time. I've long since devoted myself to more practical topics as a working programmer, but it's fun to peek into what's happening at the bleeding edge of pure computer science research.

    --
    Irony: Agile development has too much intertia to be abandoned now.
  2. Inaccurate Summary by Anonymous Coward · · Score: 3, Interesting

    First, having worked on software for DNA sequence alignment years ago, I don't think anyone thought they could do better than Smith-Waterman (a minor DNA-centric extension of a DP algorithm to find Levenshtein distance). Rather, everyone was using clever implementation techniques (SIMD instructions, GPUs, FPGAs) and heuristics to limit the amount of brute-force aligning. Oh, and lots and lots of hardware.

    Second, the summary author thinks that a dynamic programming algorithm isn't clever? They're either quite arrogant or don't know what they're talking about. Quicksort must be downright stupid, eh? Something for the short bus kids to play with.