GCHQ Challenge Solution Explained
First time accepted submitter DrDevil writes "The British spy agency GCHQ recently published a puzzle at canyoucrackit.co.uk (as featured on Slashdot), now just a few days later an academic at the University of Greenwich in England has posted a full video explanation of the puzzle. The puzzle has three stages and is not at all simple — likely to challenge even the best computer science graduates."
I didn't give the challenge a serious go, but stage 1 just seems convoluted - why is it the mark of a good code cracker to recognise x86 bytecode?
To prevent this day from getting worse, I'll just read ERROR as GOOD TH
You can probably find some examples by doing a search for the start of the code. The problem right now is that all the hits are on the challenge itself.
My opinion, as someone who has both solved and organized several challenges of this sort, is that the challenge is neither hard (at least by the standards of the ones I've dealt with) nor well designed. In fact, it kind of degenerates: it starts out OK but the ending is terrible.
Stage 1 is interesting: it combines recognizing executable code (the first thing I thought when I stared at that hex dump is "this looks like x86 code", but being able to recognize binary architectures is a valuable skill) combined with some steganography (fishing out the rest of the required data from the PNG. Fair enough, and OK for a first round.
Stage 2 starts out well: virtual machines are used for obfuscation and make fun challenges. However, the execution is backwards. Being given VM bytecode and a specificiation to implement a VM isn't a hacking or reverse engineering challenge; it's just work ("go implement this for me"). A much better challenge would be to be given either the spec or (preferably) code that implements it, and then have to reverse engineer the bytecode itself to solve the puzzle. That involves writing a custom disassembler, which is a much more interesting task.
Stage 3 is a clusterfuck. It's just an executable that checks for a few constants in a file and then builds a URL out of the rest of it. There's a hash (old-school DES crypt() salted password) that the input has to match, but even though it's crackable using a dictionary, you don't even have to do that because the URL includes the hash (which is in the executable), not the plaintext! The rest of the URL isn't checked, and it's basically a guessing game where you have to fish out constants from previous levels. It's just a glorified way of saying "okay, now take a wild guess as to what numbers to stick in the URL". It's not realistic in the slightest.
Anyone interested in a "better stage 2" might want to check out a level that I put together for the Hack-It competition at the 18th Euskal Encounter (2010). Your goal is to figure out the 64-bit input key that works (if you don't know what "works" means, compile and run the code and it should be obvious). The full set of challenges can be found here: 2010 2011 (unfortunately, the website / problem statements are in Spanish, but I'm sure you can work it out with a bit of copy/pasting into Google Translate - if there's enough interest I'll translate them to English).
So if you can't crack it, but you can bypass the challenge, do you still win?
http://www.canyoucrackit.co.uk/soyoudidit.asp
I'm aware that the solution has been leaking out onto the net...
Starting later than most, in spare time, I've trudged through stages One and Two... I've been playing with the stage-3 executable and have disassembled it... though there remains further tedious trudging for me to demonstrate by sensible sequential steps how to go about solving stage-3.
I'm finding it difficult to convince myself that it's worth the effort... I'm sure I can fathom any remaining steps - based upon the fact that there has been little about stages one and two that was actually 'challenging'. It seems silly to plod onwards without 'cheating'.
I was interested principally to try and find out what sort of skills GCHQ actually want... I never assumed I'd be (one of the) first to solve it. The experience has left me wondering what sort of job this sort of tom-foolery would suit one for. Sure debugging and OS-level skills can be valuable - but the challenge is most time consuming as one is required to guess the objective - identifying the intentions of the challenge setter rather than to address real-world issues.
I didn't realize that reversing IA-32 excutables was the modern meaning of cracking a code. I figured it would be difficult and possibly even rely on dictonary attack of a cryptographic hash, but IA-32 machine code? This sounds like they are more interested in recruiting people to analyze stuff like Stuxnet than to attract people with cryptography, information theory, and signals backgrounds. I don't claim to be crypto expert (I've took an abstract algebra class that is a requirement for all cryptography classes at a university) but my first instinct was to assume that each byte was either an xor'd (as a first pass, to get it out of signed byte space) or residue of some modular division operation. When that didn't work I started analyzing the frequency of the bytes and map them to the letter frequency distribution for English. When I realized that most symbols only appeared once I gave up. If nothing else it was an excuse to brush up on my Python iterators.
I haven't looked at the video yet, because I still want to see how far I can get with just the spoilers in the comments.
Looks like some Brits need to try out the Mystery Hunt
This is an intelligence agency, and network intrusion programs pumping executable code in the attempt at smashing a stack and jumping execution are pretty common.
Perhaps they want people who can quickly spot x86 assembly payloads from raw packet traces as part of a counter aggression op?
If we assume that their network stack isn't riddled with exploitable stack variables or pointers, and that they successfully prevent the code from running, but log the unrequested network access and dump the binary packets to file for analysis, then having people that can "at a glance" determine what kind of data is in those dumps would be valuable.
Being able to determine what it actually is supposed to do even more so.
With the recent hysteria over scada system cyber attacks (I hate that phrase btw..),setting up a fake scada system as a honeypot and seeing what the cat drags in could also make use of this skillset.
So, the obvious questions:
Does the UK fear it has poorly secured scada systems, or does it fear network worm intrusion on some network segement, and if so, what segments or systems are those?
So what screen recorder software did the author use? It plays back nicely.
Stage 1 and 2 were really easy frankly. Especially stage 2 since aside from the small error (intentional or not) in the implementation document it was just some simple coding.
Stage 3 was where I stopped, not because I was daunted or otherwise unable but because I didn't have the tools available to screw around with the exe. It was also at that point that I learned they were too stupid to even put up a robots.txt file and thus were not an organization I had any interest in working for.
I was going to hold this back until the competition was finished, but it seems the cat is out of the bag!
Here is my solution and a writeup of exactly how I got there.
http://www.craig-wood.com/nick/articles/how-i-solved-the-gchq-challenge/
Every man for himself, all in favour say "I"
after googling the solution: https://www.google.com/#q=site:canyoucrackit.co.uk&hl=en&safe=off&filter=0
'Disallow: /' in robots.txt does not stop Google from picking these URLs from other sites...
www.twitter.com/badeip posted an explanation for this challenge a while ago
http://recordmydesktop.sourceforge.net/about.php
This is probably it. Thanks.
For something ontopic. What's all that stuff about morse code in the source then? Are they providing several challenges so they can recruit people of different skill levels and I indeed different areas of expertise? It would seem the logical thing to do. Maybe there is even another even harder challenge in there to get the really really clued up people? Maybe there's a code hidden In the white space of the source code? (Just throwing down the gauntlet and teasing people).
I spend about three hours writing (mostly debugging) a JavaScript implementation of the VM. (I did make a side step to C++ because at one point the JavaScript implementation seemed to run in an infinite loop.) I also discovered that the specification was not very clear and required some interpretation. So stating that stage 2 was the simplest step, and using some code that someone else developed, is not really honest. The biggest problem was with figuring out how the jmp instruction worked and the use of the cs register, because that was not explained in the specification. For more on my implementation see here.
It took me about 3 hours to implement stage 2 in JavaScript (and in C++ when the JavaScript implementation seemed to run in an infinite loop). The specification was not really trivial with respect to the jump instruction and did not explain the use of the cs register, which is not obvious to people who have not worked with 8 mirco processors using segmented memory, such as the Z-80. I always played with the 6502, which doesn't use segmented memory and has a 16-bit program counter.
...for helping GCHQ crack the "Modern Warfare 3" video game copy protection! Well done, everyone!
If you are interested in helping us get free premium television channels, cellular phone minutes, tickets to concerts, and more money from taxpayers, then join our team!
but my first thought was, ...
the passwords probably somewhere amongst the hosted files
and if i did dns lookup, I could either use a web crawler see if anything was left somewhere obvious
or go for a dictionary attack and try and get FTP access and search through the root
but this seemed what's the word, stupid so I didn't bother
kind of wished I had tbh ... 30k a year for using a bit of commonsense
what we've learned; web masters are as lazy I'd thought
The Z80 is not a segmented memory model either; you might be thinking of some of the embedded versions such as the HD64180. It was the x86 architecture that was really afflicted with these segment registers.
Thanks for ruining the challenge in favour of some cheap publicity!!!!
Use google and search for site:canyoucrackit.co.uk and select show ommited results. http://www.google.com/search?q=site:www.canyoucrackit.co.uk&hl=en&safe=off&filter=0
Buhahahaha. Took me 10 mins for stage1, 2h for searching for missing halfkey i.e. stage2 and 10 more minutes for stage 3.
Very challenging indeed. Sk00n