Ask Slashdot: Automated Verification For Uploaded Files?
VernonNemitz writes: There are a lot of ways for hackers to abuse a web site, but it seems to me that one of them is receiving less attention than it deserves. This is the simple uploading of a malware file, that has an innocent file-name extension. I'm looking for a simple file-type verification program that the site could automatically run, on each uploaded file, to test it to see if it is actually the type of file that its file-name extension claims it is. That way, if it ever gets double-clicked, we can be assured it won't hijack the system or worse. At the moment I'm only interested in testing .png files, but I'm sure plenty of web site operators would want to be able to test other file types. A quick Googling indicates the existence of a validator project under the OWASP umbrella, but is it the best choice, and what other choices are there?
It would be simpler to just check if it's executable in some way and then if it has a file extention that doesn't match throw up a red flag.
Well, if you are running on a Linux of Unix/BSD host, you can use the "file" utility.
Of course, that means that you need to have shell_exec() or exec() or whatever your programming language of choice uses for running shell commands, and the other security dangers/issues involved with allowing that type of stuff.
What may be best/easiest/safest would be to NOT allow direct HTTP access to the uploaded files, but rather use a wrapper script that would send appropriate headers to make the browser believe that the file is of the type "x-application/unknown" or whatever content type that will force a "save as" dialog instead of opening with a plugin, auto opening with a local application, etc.
Don't blame me, I voted for Kodos
man 1 file
In PHP, simply run something like the following against the file and see if you get a valid result back
http://php.net/manual/en/funct...
http://php.net/manual/en/funct...
Just check the first 3 bytes of the file to be what it should be. If you're just worried about png, then check that its PNG. If you're looking for malicious png's its a different ball game, get a virus scanner. If you want other types then get an api for checking the magic number.
This should be on stackoverflow, not slasdot
The file command does exactly this. Type in "file foo", it will tell you what it is.
No need to add any additional software to the Linux box.
Anybody still using systems that run stuff found on a webpage? Wouldn't such hopeless systems die out from the damage?
libmagic(3) and file(1). Plus, if you need to tune them, magic(5).
In Soviet Washington the swamp drains you.
This sounds like a bizarre use case, or at worst, a college project that someone can't figure out.
test it to see if it is actually the type of file that its file-name extension claims it is.
There are various ways to make "hybrid" files which are multiple types. Graphics files which are also archives, etc. What you really want to do is normalize the files to the type they're supposed to be. PNGs are a good candidate for this because PNG is lossless, so you can decode the image and re-encode it without losing information.
Image files can contain metadata. Metadata can contain PHP tags.
Append a path to an image, and a suitably (mis-)configured server will treat the image file as a PHP script. e.g. if the image is available at .../path/to/image.png
and you fetch the URL .../path/to/image.png/foo.php
then the embedded PHP will be executed.
This won't work because a file can be a valid file in multiple formats at once and it can also be an invalid file that is nevertheless interpreted as a valid file as well.
Take for example, a plain-text file. Harmless, right? Nope. It can also be a valid HTML file containing executable JavaScript. Or an XML file containing a billion laughs attack.
Or take media type sniffing. Some browsers bend over backwards to interpret crap as HTML even when labelled otherwise by the Content-Type HTTP header. So one attack is to stuff enough HTML into PNG metadata to confuse a browser that doesn't follow the standards into thinking that it's HTML. This is a valid PNG file and anything that checks to see if it's really a PNG file will tell you that much. But it's still not safe.
Bogtha Bogtha Bogtha
http://man.cx/file
Modern app appers know that only apps can app apps, so if everything is uploaded as .app, then apps will app apps while apping other apps!
Apps!
I just want to piss up your nose and shit all over your face and smear it into your skin.
If I recall correctly, you have the file in memory before you save it to disk. Check if the first bytes are 0x8950E4E70D0A1A0A and it should be "close enough".I'm not sure if anyone has compiled a list of headers and file extensions, but it seems a little overkill.
Live today, because you never know what tomorrow brings
And what if there's a semicolon or another interesting character in the filename ?
Don't accept foreign input and put it out as your own (on your web page). It's just a disaster waiting to happen. Misconfigurations or bugs could happen at any point.
What you do is you take the input and verify that's the input you're expecting. Not just a PDF file or a PNG file but make sure you only accept PDF/PNG and then parse it and rewrite it in a way that takes out any and all foreign input. You're expecting text, only parse text, images, only parse images and parse anything within a jail with limited permissions. If the file is 'broken' or contains any scripts or anything else (it doesn't parse well enough) reject it.
There are all sorts of manner (called magic) to determine files but they only take a look at the first few bytes and return based on a table. You could easily fool those things and they don't check whether the files are valid or not. Additionally, check for viruses
Custom electronics and digital signage for your business: www.evcircuits.com
Yo dawg, is that you?
This is on the right track, because as others have said, just because it's valid png doesn't mean it's not also valid PHP and Javascript. I just pulled a file like that off a server yesterday.
HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities. Protection from directory transversal is harder than it looks, fopen_url, and memory depletion from failing to disable the output buffer before reading and writing chunks of the file.
A better, safer, higher performance option is to RemoveHandler PHP and RemoveHandler cgi-script in the designated upload directory, which should be the only directory that's writeeable.
A further problem this solves is since the directory is writeable, the designated upload script which checks the files probably is NOT the only mechanism to put files there. Imperfections in other scripts will allow bad guys to upload any file they want, to the world-writeable directory* . Therefore, use httpd.conf to ensure that any scripts in that directory can not run.
* Instead making it -explicitly- world writeable, you can instead use SuExec, which effectively makes the ENTIRE SITE world-writeable. This is extremely stupid.
Sadly Unix's 'file' utility is not sufficient for security purposes. Generally, file only checks for magic numbers near the beginning of the file. Many file formats remain valid, even with prepended data. For example, Python programs with several source files can be archived into a single zip file and still be executed, but you can stick a shebang onto the beginning, and still have Python (or most zip programs) recognise the archive as a zip file. There's a good video on youtube about this kind of thing: https://www.youtube.com/watch?... tl;dr: This is security. It goes wrong in amusing and unobvious ways.
Excuse for why is your room always messy?
look out for sql injection as well.
This doesn't even make sense. In what universe does double-clicking an executable file with a PNG extension cause the file to be executed?
Try a reverse proxy with a malware scanning component.
Or subscribe to the premium service for Virus Total and use the API to check all uploads to your server.
Learning HOW to think is more important than learning WHAT to think.
Zip of death
Is it a zip file? Yes
Is it dangerous? Yes
So how do you test for this without opening the file in a virtual environment and seeing what happens?
I have a feeling that testing for malicious files is akin to solving the halting problem
I am Slashdot. Are you Slashdot as well?
There's no way to determine what type a file really is. File types are designated in the Windows world by extensions (the .jpg in bigdick.jpg), but applications and other OSes use actual file information (typically the first few / few dozen bytes) of the file to determine what to do with it.
This typically involves some specific byte sequence, or "magic number", which alerts the OR/program to start trying to read a particular type of header, or tells it the file is big/little endian.
However, ANY file can contain those strings, and I've run into cases where Office docs have contained the magic numbers for JPG (or was it PNG) and shit got all fucked up. They best you can really do is trust the file extension / mimetype after your virus scanner says it's okay. Then you can TRY to process the file as what it claims to be and handle failures gracefully. If you want to be nice you can try to scan the files for those magic byte sequences, but I gave up on doing that because it's a fucking pain. I just bail out and tell the user to upload working shit, not my problem.
When you're talking about PNG, if you're looking to avoid malicious files, you can just check the headers.
It's always the following decimal values:
137 80 78 71 13 10 26 10
Things get more tricky when you're talking about an exploitable file type, in which additional validation is required, but for most purposes, if the file being broken won't ruin the application, this is fine.
"I decided I could write something better than everything out there in two weeks. And I was right." - Linus Torvalds
In addition to the above method, I simply ignore the original filename (and save it somewhere) and rename the file to a random UUID+the auto detected extension (for images you only need a couple of headers, for example).
"I decided I could write something better than everything out there in two weeks. And I was right." - Linus Torvalds
If the rate and resolution of .png's isn't too great I've found an elegant solution is to display the image on one cheap system's monitor (think Raspberry PI 1), with a camera on a 2nd system taking a image whenever the screen isn't blank and performing a OCR on text below for file name and any other metadata. You can wrap the whole thing in cardboard so light saturation isn't an issue. There is a history of image related exploits that have nothing to do with if the file is executable or not that this provides a guard against.
For image files just convert it to another format at the highest possible resolution and then back again. Maybe an executable could survive that, but I haven't seen one that has yet to get through (and yes, I've tried it with some infected and/or bogus files).
And yes, I fully admit that it's a sleazy trick but it seems to work pretty well.
For other files type, I dunno.
Just cruising through this digital world at 33 1/3 rpm...
There are several layers here that make a solution quite "interesting". On the one hand you are trying to protect your users by avoiding serving them bad content. On the other hand you want to protect your service. Protecting your users means doing more work on the uploaded content which increases your own attack surface.
Personally if we are just talking about PNGs then I think that one of the safest things for your clients/customers would be to not serve the file as uploaded, but to serve a file that is the result of a successful render->save process (which might get you a bonus improvement of allowing you to optimise the image). That way you should end up serving a valid image without any dodgy stuff someone may have tried to sneak through. Of course there have been plenty of vulnerabilities in image handling over the years. So reprocessing the images does come with it's own risk that might suggest it's own mitigations (eg doing it on a seperate untrusted server that doesn't have access to anything interesting).
There might be third party services you could use, but of course that opens up it's own questions in terms of trust, security and availability.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Read up on the efforts some router and modem brands goto to try and protect their firmware like updates over the life of a product line.
Signed checksum, private key, verified public key systems.
Domestic spying is now "Benign Information Gathering"
HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities.
1) Protection from directory transversal is harder than it looks,
2) fopen_url, and
3) memory depletion from failing to disable the output buffer before reading and writing chunks of the file.
I'm a PHP dev, and the first two are relatively straightforward to prevent. EG: Check that basename($file) == realpath(Basename($file)) kind of stuff. But #3 is interesting to me; how would the following cause any problem?
$fp = fopen($hugefile, 'r');
while ($line = fgets($fp, 1024))
echo $line;
In this case, the buffered output will be spooled to Apache/end user as it fills. Or did you mean OOM errors from trying to load a 2 GB file into RAM?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
You need to flush() and ob_flush() after each echo, or PHP will buffer ~ the entire thing in RAM. When a bad guy hits it, he'll have it buffer 100,000 copies in RAM.
You'll also need to send Content-Length header manually in the PHP, otherwise the header can't be set without buffering the whole file. Compression and encoding can bite you here, so disable compression. Of course you've kinda broken resume, if someone loses their connection halfway through the download. OR ...
Check out X-Sendfile. That's an all around better. Content-length, compression, partials, HEAD - all of that is already taken care of. If you use an older version of Apache, it will need to be installed as a module.
As to #2, fopen_url - there are a shit ton of ways that gets exploited, so really the "right" answer, IMHO, is to make it's disabled, then double check the input anyway,
Yeah. Some would probably argue it's overkill; and of course it opens a potential new exploit (if imagemagic or the GD library or whatever you use has serious flaw) - but for the really paranoid applications I've worked on, I generate a new image from the old one, using a trusted library. I figure by converting whatever is "valid image format data" into plain RGB(a) and back to image format data again, will get rid of anything seriously nasty.
I think therefore I am... a Linux geek.
For PNG files specifically, there is a "pngcheck" utility that parses the file and verifies the contents are valid.
If you want to go a step further, you can use "pngcrush" to parse and repack/compress the file and strip out any extra data chunks that are not required to display the image. That should strip out any malicious or malformed content, and can be run on a sandbox that is not directly accessible, so if there is a compromise of pngcrush or pngcheck the effects can be isolated.