Slashdot Mirror


Tetris Is Hard To Test

New submitter JackDW writes: Tetris is one of the best-known computer games ever made. It's easy to play but hard to master, and it's based on a NP-hard problem. But that's not all that's difficult about it. Though it's simple enough to be implemented in one line of BBC BASIC, it's complex enough to be really hard to thoroughly test.

It may seem like you can test everything in Tetris just by playing it for a few minutes, but this is very unlikely! As I explain in this article, the game is filled with special cases that rarely occur in normal play, and these can only be easily found with the help of a coverage tool.

37 of 169 comments (clear)

  1. One line? by GrahamCox · · Score: 4, Funny

    If anybody wrote code like that for me, they'd be made to sit on the naughty step and think very, very hard about what they'd done.

    1. Re:One line? by ShanghaiBill · · Score: 2, Informative

      Really, who couldn't love code like this:

      Except that is not "one line". It is six lines. Any program can be a "one-liner" if there is no limit on the line length. Well, unless you writing it in Python.

      Also, as long as I am on a rant, Tetris is NOT NP-Hard, since the arrival of the blocks is probabilistic. It is only if the entire sequence of blocks is known in advance that it becomes NP-Hard. But that doesn't happen in actual play.

    2. Re:One line? by lgw · · Score: 4, Informative

      Except that is not "one line". It is six lines. Any program can be a "one-liner" if there is no limit on the line length. Well, unless you writing it in Python.

      The line length limit is 256 bytes, of course. And these hacks are the basic-equivalent of the C obfuscation contest.

      As the authors say: "I'd like to think it is self documenting. The code speaks for itself; even if what it has to say is not very nice."

      --
      Socialism: a lie told by totalitarians and believed by fools.
    3. Re:One line? by BlackHawk-666 · · Score: 4, Informative

      In ASCII, but many BASICs will reduce keywords down to a single byte.

      --
      All those moments will be lost in time, like tears in rain.
    4. Re:One line? by Anonymous Coward · · Score: 2, Informative

      Except that is not "one line". It is six lines. Any program can be a "one-liner" if there is no limit on the line length. Well, unless you writing it in Python.

      The line length limit is 256 bytes, of course. And these hacks are the basic-equivalent of the C obfuscation contest.

      As the authors say: "I'd like to think it is self documenting. The code speaks for itself; even if what it has to say is not very nice."

      And here we get to the core of the problem. The presenter passes off information in an incorrect light, so only the audience that cares to accept it will continue accepting his later statements.

      One of those statements is that code coverage tells you what to test. It doesn't. It tells you what you haven't bothered to test. What you need to test is driven by the user stories, customer requirements, and other bits of development documentation. Testers writing tests for permutations of non-important functionality are wasting company resources, especially if the important functionality was treated with a light "get the coverage up" style of testing.

      A real test doesn't make the lines of code tested the goal. It makes corner cases, edge conditions, unexpected value selection, illegal parameter passing, etc. of the methods most likely to be used the point. Then, if there is extra time, it extends those tests to the next highest set of code, ordered by business value.

      Stating that lines of code is the goal is a metric we have proven to be useless again and again (since the 1980's); so, where is the compelling evidence that it suddenly became valuable in the realm of unit testing? Not covering code with any tests is a red flag, if the code matters. Covering all code with the lightest battery of tests to get the line count up is money spent without much value.

    5. Re:One line? by K.+S.+Kyosuke · · Score: 2

      It would be probably shorter in Forth. ;-)

      --
      Ezekiel 23:20
    6. Re:One line? by LordKronos · · Score: 2

      Actually, it is written for resource efficiency...specifically program size, which uses memory. The goal was to write a 1 line program, and in BBC Basic, that meant they were limited to 256 characters. Yes, maybe they could have wrote things with more verbose naming and had it compile to the same size, but the particular goal there was to write something big with little code. I think they accomplished it fairly well, and probably 95% (at least) of programmers would be hard pressed to replicate their results. I suspect the people capable of accomplishing that would be capable of similar accomplishments on an embedded platform, given the necessary experience with the particular hardware.

    7. Re:One line? by Ed+Avis · · Score: 3, Insightful

      It's analogous to testing itself. Testing cannot prove the absence of bugs, though it can find them. Similarly a coverage check cannot show that your test suite is adequate, but it can show it to be inadequate (or perhaps reveal dead code to prune). Nobody is claiming that coverage is the be all and end all of testing. That does not mean it is useless to measure it.

      --
      -- Ed Avis ed@membled.com
  2. in Soviet Russia by Trepidity · · Score: 4, Funny

    It may seem like you can test everything in Tetris just by playing it for a few minutes, but this is very unlikely! As I explain in this article, the game is filled with special cases that rarely occur in normal play, and these can only be easily found with the help of a coverage tool.

    Tetris doesn't need coverage tool to test you. Everything about you.

    Code-coverage tool is crutch for weak capitalist engineer. Tetris is Soviet technology, forged by people's will.

    1. Re:in Soviet Russia by flargleblarg · · Score: 2

      Tetris doesn't need coverage tool to test you. Everything about you.

      So what you're saying is...

      In Soviet Russia, Tetris game tests you!

  3. Nice advertisement by Anonymous Coward · · Score: 5, Insightful

    From a company promoting automated WCET analysis. Hah!

    1. Re:Nice advertisement by phantomfive · · Score: 4, Insightful

      Normal users don't test all cases of a game.

      Maybe not, but as soon as you tell yourself, "I don't need to test this code, a normal user will never get to it;" you can be certain that after saying that, a user will find a way to break it. The Gods of Eternity will laugh at you.

      --
      "First they came for the slanderers and i said nothing."
    2. Re:Nice advertisement by JackDW · · Score: 2

      Submitter here. It's "marketing spam" in the sense that it's based on something I did at work. I don't see why this is a problem. Many articles linked from this site involve something that someone did at work.

      I thought it was interesting that, though this is a really simple game, you can't test it effectively just by playing it. You have to deliberately seek out all of special cases. That's a fact about virtually all software, but it's not an intuitive one, and that's what the article is about.

      --
      You're an immobile computer, remember?
  4. replacing line feeds with terminators is not a 1-l by raymorris · · Score: 4, Informative

    In BBC BASIC, a colon is a statement terminator, much like a semicolon in languages with C-style syntax. The linked code is therefore not a one-liner by any meaningful definition of the term. One could replace all of the linefeeds in Linux kernel source with semicolons and other appropriate terminators. That wouldn't make the kernel a one-liner.

  5. Re:Perl-standard line length by JackDW · · Score: 3, Informative

    A BBC BASIC line is limited to 256 bytes, including the line number. It is impressive to squeeze the whole game into that space. There's also a 140-byte Javascript version, also very impressive, though it has fewer features and doesn't have all the block types.

    --
    You're an immobile computer, remember?
  6. Re:replacing line feeds with terminators is not a by Nyder · · Score: 2

    In BBC BASIC, a colon is a statement terminator, much like a semicolon in languages with C-style syntax. The linked code is therefore not a one-liner by any meaningful definition of the term. One could replace all of the linefeeds in Linux kernel source with semicolons and other appropriate terminators. That wouldn't make the kernel a one-liner.

    If it's a one-line program, why is it more than one line?

    A line of code is not the same as a line on a screen. The program won't fit on one line on most screens, but it will fit on one line of BBC BASIC, which fortunately has a well defined, but short maximum length of 256 characters. The whole program is 257 bytes as there is an extra byte to mark the end of the program.

    from: http://survex.com/~olly/rheoli...

    --
    Be seeing you...
  7. Re:One Line by dgatwood · · Score: 3, Informative

    It's simple enough to implement in a shell script. At least three or four of us have done it over the years.

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.

  8. perhaps. I wonder if it NP-hard by raymorris · · Score: 4, Interesting

    The fact that probability is involved doesn't mean there's not an optimal strategy, of course, where optimal is defined as "highest expected score" (score X probability). So figuring out an optimal strategy is a hard problem - how hard is it?

    If the probability of a certain series of shapes coming next were 100%, we'd have an NP-hard problem, agreed? Does another probability make it easier or harder? Harder, if anything. That's provable because the probability version can be solved by solving each of the potential series as if each were known. What's harder than NP-hard? It may well still be NP-hard. It can't be of any more solvable complexity class.

    1. Re:perhaps. I wonder if it NP-hard by phantomfive · · Score: 4, Funny

      What's harder than NP-hard?

      Intractable.

      --
      "First they came for the slanderers and i said nothing."
    2. Re:perhaps. I wonder if it NP-hard by allo · · Score: 3, Informative

      I disagree.
      For a stochastic process your greedy "take the option with the best chance" algorithm may work, it may fail completely, just depending on the random numbers. If you have an stochastic polynomial algorithm, you have a chance to get the same or better expectation value than your "optimize global then choose greedy" algorithm. Both approaches may win or fail, but in the deterministic game the np-complete version always wins, while the "shortcut" version cannot compete. In the stochastic version, the shortcut may be as good as the optimal solution, because you cannot get the global optimum anyway so choosing a local one may be a good choice.

  9. Infomercial for a code coverage tool? by 140Mandak262Jamuna · · Score: 4, Interesting
    Code coverage tool seems like a good idea from some theoretical stand point. But in practice number of code paths multiply rapidly and getting all the paths executed would involve unreasonably long time. Further rarely called procedures or rarely executed is just one class of problems. There are functions that will execute million times correctly and misbehave once in a million or once in a billion calls. For example I came across a bug in something so simple like calculating the centroid of a triangle. Absurdly simply code that adds the x, y, z coordinates of the vertices and divides by 3. That is all. In dealing with output of some CAD software, when the smallest angle of the triangle fell below 1.0e-08 radians, it returned a wrong value of the centroid. Typical sanity checks based on mathematical facts, like centroid of a triangle can never be outside the triangle will not work. The code that checks inside-or-outside of triangle is far more complex than the centroid code. The floating point truncation errors make this kind of sanity check useless. You can't even plot it on the screen and look at the centroid. OpenGL is implemented in single precision.

    So at some point you reach a point of diminishing returns. It might not be worth making sure every line got tested when there are procedures that have a bug that happens in one in a billion calls. My philosophy is, "Perfection is the goal. Doing better than the last release is the shipping criterion".

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    1. Re:Infomercial for a code coverage tool? by Anonymous+Brave+Guy · · Score: 2

      All that code coverage does is let you focus on what has not been touched, then you'll be able to test it somehow.

      The trouble is that what you really need to test isn't how much coverage of the code you've got, but how much coverage of the possible input space. More specifically, you ideally want to know that each distinct combination of inputs that will cause a different type of behaviour in the code has been considered.

      Of course, this is typically an implausibly difficult problem to solve in real world projects. To see why, consider that this article proudly claimed that finding the special case of clearing 4 lines together twice in a row was easy with their tool, and it also said that there were similar combo special cases for 3, 2 or 1 lines, but it conveniently overlooked the possibility of code that ran in all four cases and took the number of lines as a parameter. Testing for any one of those four cases would count as coverage with most tools, but wouldn't guard against implicit conditionals like overflow/underflow that might be relevant to some cases but not others, nor for behaviours that arose only with certain combinations of multiple explicit conditions.

      Coverage tools are useful up to a point, but not nearly the silver bullets that these kinds of article suggest.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  10. Nonsense -- make your own test suite by rs1n · · Score: 4, Insightful

    Why would you only test your code via normal use? Why wouldn't you just create a test suite that actually tests all the scenarios? In the case of tetris, you can simply force a sequence of pieces that will enable you to reach the scenarios described in the article. Or you can even start the game with a pre-made board.

    Has slashdot really become a means for tech companies to inject free advertisement by a simple blog post made to look like real journalism?

    1. Re:Nonsense -- make your own test suite by NormalVisual · · Score: 2

      Why wouldn't you just create a test suite that actually tests all the scenarios?

      Defining all of the possible scenarios is often a lot harder than it looks. There aren't too many UI coders out there that haven't said "yeah, we need to fix it, but what made the user decide to do that?" at one time or another.

      --
      Please stand clear of the doors, por favor mantenganse alejado de las puertas
  11. Beware coverage tools by Anonymous Coward · · Score: 3, Insightful

    The article makes it sound like coverage tools help! If you're not familiar with them, they tell you which bits of code have been run, not how many of the N cases of that code have been executed.

    So the code might fail with a particular combination of inputs, but the coverage tool is more interested in which bits of the code have been execute.

    It's one of these tools and metrics that non-technical managers use to substitute for an ability to read code.

  12. Re:replacing line feeds with terminators is not a by NormalVisual · · Score: 2

    46 lines (statements), actually. There's a linked page where he blows it out into a readable form and explains how it works in depth.

    --
    Please stand clear of the doors, por favor mantenganse alejado de las puertas
  13. Re:replacing line feeds with terminators is not a by phantomfive · · Score: 2

    In BBC Basic, a single line is 255 characters. RTFA.

    --
    "First they came for the slanderers and i said nothing."
  14. C64 BASIC version within a screen of code by Neo-Rio-101 · · Score: 3, Interesting

    It's a 15 liner.
    Note that the {CBM-x} represents the graphic on that particular key (press the c= key and the letter to produce it)

    1 a$="efijefijefijefijbfjnhijkbfjnhijkijfgaefjijfgaefjefjkiefbefjkiefbbfjidefj"
    2 a$=a$+"abeieijkaeijijkgabfjiefgehijebfj@abe@dhe":o=207:dime(o):forx=0to111
    3 print," {CBM-M}"," {CBM-G}":p=asc(mid$(a$,x+1)):e(x)=(pand3)+(pand12)*10:next:m=2024
    4 print," {CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}{CBM-T}":gosub6:goto7
    5 pokei+e(r),c:pokei+e(r+1),c:pokei+e(r+2),c:pokei+e(r+3),c:c=160:r$="d":return
    6 i=1152:r=h:c=32:gosub5:j=int(rnd(0)*7)*16:r=j:gosub5:r=h:h=j:i=i+9:return
    7 gosub6:w=i:t=i:g=r:k=240:l=1278
    8 gosub15:c=32:gosub5:r=-r*b-g*notb:g=r
    9 i=w:w=i+40:gosub5:gosub15:ifbthen12
    10 getk$:g=randkor((r-4*(k$="s"))and15)
    11 t=w:w=w+(k$=l$)-(k$=r$):l=l+40:goto8
    12 c=o:gosub5:m=m-(l0:w=-t*b-w*notb:l$="a":return

    --
    READY.
    PRINT ""+-0
  15. erratum: missing lines by Neo-Rio-101 · · Score: 2

    13 fors=ltol+(m-160-l)*qstep-1:pokes+11,peek(s-29):next:y=y-q:m=m+q*40:z=z+q
    14 l=l+40*not-q:next:print"{home}";z:goto7
    15 b=0:forx=0to3:b=bor1andpeek(w+e(g+x)):next:b=b>0:w=-t*b-w*notb:l$="a":return

    {home} is the home character blob (appears as reverse S character) within quotes

    Use A,S,D keys to rotate and drop the blocks
    Have fun!

    --
    READY.
    PRINT ""+-0
  16. DMCA incoming by tepples · · Score: 2

    How long before The Tetris Company sends a notice of claimed infringement to Slashdot's designated agent, claiming that this post and others like it infringes the copyright in Tetris ?

  17. cats are mammals, not all mammals are cats by raymorris · · Score: 2

    A line can be no more than 256 characters. That doesn't mean that the following is one line:

    foreach mammal in pets
            print mammal ' "is a mammal"
            if (is_cat(mammal) {
                    print " and also a cat"
            }
    }

    Just because all cats are mammals doesn't mean that all mammals are cats.
    Just because all one-liners are less than 257 characters doesn't mean that all programs less than 257 characters are one-liners.

  18. Re:replacing line feeds with terminators is not a by psmears · · Score: 2

    46 lines (statements), actually

    No, statements are not the same as lines. Lines have real semantic significance in BBC Basic, in a few different ways: for one, GOSUB-type subroutines can only start at the start of a line (because that's where the line number is), and you also can't terminate an "if" without starting a new line. That (plus the 256-byte limit) makes writing one-liners in the language more of a challenge than in other languages where line breaks genuinely aren't significant.

  19. Re:Sort of spammy, also not convincing by wrook · · Score: 3, Insightful

    Code coverage tools will not tell you if your tests are sufficient. They simply tell you what lines of code were hit. They don't tell you whether or not the line of code was hit while doing a meaningful test. In fact, it is trivially easy to write "tests" that exercise 100% of the code but have no expectations at all.

    What code coverage tools tell you is what code you definitely haven't tested. If you haven't run that line of code in your tests, you definitely haven't tested it. This is useful information, but not essential if you have a good team. My current team is quite comfortable writing tests. We do most things TDD and without trying hard our average code coverage is 96%. I occasionally wander through the other 4% to see if it is worth testing and most of the time it isn't. Occasionally I will find the odd piece of logic that was jammed in hurriedly without tests, but on our team it is quite rare. On the other hand, I have worked on teams that were not comfortable writing tests and mostly wrote them after writing production code. On those teams we would get about 75% test coverage with holes you could drive a bus through. A code coverage tool was very useful for educating people on the need to improve the way they wrote tests.

    I feel very confident I could TDD my way through a tetris implementation and get 100% code coverage without undue effort. I don't think I would find all of the corner cases without help, though. A code coverage tool wouldn't help me in that instance.

  20. Re:replacing line feeds with terminators is not a by itzly · · Score: 2

    The nice part about putting the entire Linux kernel on a single line is that you always know the exact line number of the bug.

  21. Re:defined as expected value by allo · · Score: 2

    Jep, and your strategy gets worse, while another strategy may stay average.

    Assume you flip a coin. heads or tails.

    You NP-complete algorithm knows the sequence.
    The probalistic one just guesses "heads".

    Now the Expectation value of both are Zero in the stochastic case. In the deterministic one its infinite win for the np-complete one and zero for the probalistic one.
    So this is an example, where a perfect algorithm for deterministic data is just as good as another for stochastic data.
    This does not mean, you cannot get better than the deterministic-optimal algorithm. As the data is random, the deterministic-optimal algorithm becomes "just another algorithm" and you will need to proof again its better than the other candidates. It may be, maybe even provable the still the optimal one, but the fact that its det-optimal does not imply that its prob-optimal.

    A real world example: Las Vegas Quicksort.
    Quicksort is O(n log n), choosing log n times a median element, then sorting in n the elements on both sides to the correct side.

    calculating the median element is O(n), with some constant C.
    Choosing the median element randomly is O(1).
    Now you can show, that the propability of a "not too bad" element is so, that the log n * C factor is slower in most cases than the slowdown of the algorithm by choosing a non-optimal "median".

  22. if I understand your point by raymorris · · Score: 2

    Let me see if I correctly understand your point. Are you saying:

    The best algorithm for a deterministic sequence may be / is NP-hard.
    Best best algorithm for the stochastic sequence may be different.
    Therefore, the best algorithm for the stochastic sequence may be easier than NP-hard.

    That seems to make sense. Until you realize the deterministic sequence IS one case of the stochastic - where the probabilityof a certain sequence happens to be 1.00. If you had a polynomial algorithm for probability X, you could run it for X=1 to solve the deterministic version. Thus, if the deterministic version is NP-hard, so must be the stochastic version in general.

    Of course, one specific, special case probability matrix may be solvable more readily. As someone else pointed out, if you decide that your specific version of Tetris generates square pieces always, the algorithm is trivial. However, if we seek an algorithm for any version of Tetris, any probability of any given piece, that will include probability 1 of a given sequence. Thus, the generalized stochastic includes the deterministic as one case and therefore must be at least as complex as that case.

    1. Re:if I understand your point by allo · · Score: 2

      no, you got it only the half way.

      deterministic:
      best = np-hard, perfekt
      other: polynomial, good average

      stochastic:
      best: np-hard, not perfect, quality unknown
      other: polynomial, good average

      the point is not the runtime complexity, but the result. while the best algorithm cannot be beaten on the det. sequence, it may fail completely (in terms of quality) on a sequence without full information. If you got a good polynomial one with an average result, it may be better for many sequences.

      one example may be an perfect algorithm with a lookup table for all sequences and a greedy algorithm.
      With the random sequence, the "perfect" algorithm needs to choose the sequence which is most likely for its next move, discarding a lot of other strategies. Now it may have chosen the worst strategy for the next random and unlikely event. The other algorithm optimizes locally anyway and will continue in both cases as if it does not know the next event.