Slashdot Mirror


Tracing the Limits of Computation

An anonymous reader writes: For more than 40 years, researchers had been trying to find a better way to compare two arbitrary strings of characters, such as the long strings of chemical letters within DNA molecules. The most widely used algorithm is slow and not all that clever: It proceeds step-by-step down the two lists, comparing values at each step. If a better method to calculate this "edit distance" could be found, researchers would be able to quickly compare full genomes or large data sets, and computer scientists would have a powerful new tool with which they could attempt to solve additional problems in the field.

Yet in a paper presented at the ACM Symposium on Theory of Computing, two researchers from the Massachusetts Institute of Technology put forth a mathematical proof that the current best algorithm was "optimal" — in other words, that finding a more efficient way to compute edit distance was mathematically impossible. But researchers aren't quite ready to record the time of death. One significant loophole remains. The impossibility result is only true if another, famously unproven statement called the strong exponential time hypothesis (SETH) is also true.

1 of 82 comments (clear)

  1. Re:The algorithm isn't clever, but scales well. by Required+Snark · · Score: 3, Informative
    The formal description of the edit distance is different from that described in the article. This is the one used in some cases of DNA or text comparison.

    Insertion of a single symbol. If a = uv, then inserting the symbol x produces uxv. This can also be denoted e->x, using e to denote the empty string.

    Deletion of a single symbol changes uxv to uv (x->e).

    Substitution of a single symbol x for a symbol y x changes uxv to uyv (x->y).

    It's not just about changing a symbol, it also includes adding and deleting symbols. That makes simple parallelization impossible.

    It's not clear if the mathematical proof cited in the article is the simple one of only changing symbols, or the more complex case of symbol addition and deletion. From the article, it appears that they are talking about the simple case, which makes the more complex algorithm even more intractable.

    --
    Why is Snark Required?