Flawed Online Tutorials Led To Vulnerabilities In Software (helpnetsecurity.com)
An anonymous reader quotes Help Net Security:
Researchers from several German universities have checked the PHP codebases of over 64,000 projects on GitHub, and found 117 vulnerabilities that they believe have been introduced through the use of code from popular but insufficiently reviewed tutorials. The researchers identified popular tutorials by inputting search terms such as "mysql tutorial", "php search form", "javascript echo user input", etc. into Google Search. The first five results for each query were then manually reviewed and evaluated for SQLi and XSS vulnerabilities by following the Open Web Application Security Project's Guidelines. This resulted in the discovery of 9 tutorials containing vulnerable code (6 with SQLi, 3 with XSS).
The researchers then checked for the code in GitHub repositories, and concluded that "there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities." Their paper is titled "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery."
The researchers then checked for the code in GitHub repositories, and concluded that "there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities." Their paper is titled "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery."
Researchers from several German universities have checked the PHP codebases of over 64,000 projects
The researchers identified popular tutorials by inputting search terms such as "mysql tutorial"
Ah, I see where they went wrong. They should have searched for "real mysql tutorial."
systemd is Roko's Basilisk.
I believe it.
I've come across countless tutorials that cover things like capturing and using form field input, but almost NEVER see a single word in them about sanitizing data, or guarding against bad, malformed, or malicious data.
It's just, "Here's how ya get the data, now go jam it in the database or print it right to the screen!" Fuck me.
And in all fairness, as a PHP user, I've seen a *lot* of PHP tutorials that were bad, stupidly dangerous, or just plain wrong. One of the most egregious was a "tutorial" that showed sending the entire SQL statement to the server as a GET parameter. That's right, some guy actually coded his shot so that it sent a live SQL statement in the URL, and then blithely processed the attached variables without so much as a how-de-do.
Later I saw code that did this exact thing used in various scripts (guestbooks, registration forms, comment forms), probably based on this epically flawed "tutorial".
Just cruising through this digital world at 33 1/3 rpm...
I would love to hear the explanation of how a general purpose language would protect you against attacks like that, clearly called out in the article.
You're doing the snowflake thing, blaming everyone else for the coders' incompetence and unsuitability for the job. Some dweeb wrote a tutorial and because it's not ready to be cut and pasted into production code, that's the tutorial writer's fault.
NB: Not everyone can code.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.
Remember the left-pad crisis that broke the Internet because a developer removed his npm packages over a dispute? How hard is to write a left-pad function?
The important takeaway here is not that flawed tutorials lead to bad code. It's the implication that one could actually poison tutorials intentionally, perhaps in some very subtle way. While it would be quite difficult to inject malware this way (unless the tutorial convinces some idiot to download this "include file that you need for this function"), it probably wouldn't be too difficult to inject, say, buffer overflows or XSS vulnerabilities that could well be invisible to novice programmers. Those vulnerabilities could then be exploited post-deployment, perhaps using a bot scan of Github to identify broken apps that include the code. Rust is better because for something on the order of a 10% overhead vs C, it effectively eliminates buffer overflows (unless something is amiss with Rust itself, in which case we have only one bug to fix, but millions of precompiled vulnerabilities in the field). On balance, Rust seems like a net positive to security. It does nothing much, however, to prevent vulnerabilities having nothing to do with memory exploits. For that matter, one could probably write Rust code to exploit Rowhammer. Or poison a tutorial to do that. It would be completely "safe" multithreaded code... that isn't, thanks to ubiquitous shitty DRAM. There's another, subtler issue: UTF8 hacks. One could post a tutorial and substitute various characters with various similar characters. Maybe, just maybe, one could find a way to get some dufus to copy the code into his source and create an exploit because he confuses one character with another one that looks almost the same (or, even worse, exactly the same due to text rendering shortcomings on his end). On the vigilante end, I suppose the only solution is to first of all identify the poisoned/flawed tutorials, and secondly to search Github or other repositories for key snippets. This is a hard problem to automate due to the zillions of ways that the tutorial code might be imported into a project and tweaked to fit, without destroying the vulnerability it injects. So, to the noobs out there: read tutorials, but, at most, copy code from them by retyping it yourself. DON'T DOWNLOAD INCLUDES OR "REQUIRED BINARIES". DON'T CUT AND PASTE CODE INTO YOUR PROJECT. Cross-verify with multiple sources (which could have been manufactured by the same hacker, so beware similar look-and-feel), and if you still don't really understand what you're doing, then do it some other way. Now, for the public generally, I wish there were a way for us to protect ourselves from this crap. I don't think there is, apart from avoiding software like the plague. It's not like the code you cut and paste from the tutorial is going to create some obvious malware signature in most cases, especially if the tutorial is very abstract in nature. After all, there are endless versions of compilers and compiler settings in use out there.
Tutorials do not write bad code. People write bad code.
A tutorial’s purpose is not to follow good practices in all aspects; such a tutorial would be unreadable. A good tutorial focuses on the one aspect being demonstrated, and, for the sake of exposition, intentionally neglects all other aspects, including, but not limited to, error handling, separation of responsibilities, access control, injection avoidance, naming, cache invalidation, etc.
The implicit assumption is always that you will apply good practices that you already know from elsewhere. You get that by (a) reading books (with wider scope than just a tutorial), (b) reading war stories in blogs (where neglecting a good practice led to a horrible vulnerability or a lengthy debugging session), and (c) participating in actual real-world projects collaborating with and learning from more experienced developers.
where supposedly anyone can 'code', this is the expected result. Why would you expect anything else from cheap inexperienced labor?
Hardly the tutorials fault. A tutorial will not cover every edge case of your specific example. It will not spend 90% of its length teaching about only partially related topics. I read the same SQL and HTML GET tutorials as everyone else, and a basic understanding of programming I learned in the first month of grade 9 prepared me for sanitizing the input. SQL, isn't special, and GET is no different than CIN, it's not rocket science.
Troll is not a replacement for I disagree.
While bad tutorials help make shitty coders, there will always be shitty coders. The question is then becomes, "how do we protect internet servers from shitty code?" The answer to this is with secure interfaces and we've failed at most levels.
Let's start at the top with web serving daemons. Web serving daemons (e.g. Apache) currently support script languages (e.g. PHP) which are a minefield of insecurity. The fact that they were happy to enable script language interpreters and execute them with the same level of privilege as the web serving daemon itself (by default at least) use without a second thought shows a lack of understanding about the dangers they hold.
The next level of insecurity is in the script language interpreters which are being invoked by the web serving daemons. Script language interpreters intended for use with web servers have only "recently" added the ability to restrict certain operations. However by default, even the most dangerous operations like the execution of text strings are enabled. The most egregious flaw I've seen is in PHP which allows ability to define the value of variables that are not explicitly requested. At no point was this a good idea.
Drilling down, we get to database daemons. Database daemons do not promote the use of a function call based interface but rather a text only interface. Frankly, anything goes with a text based interface which leaves it wide open to naughty inputs. A text interface is a wonderful concept for ease of use but it's just terrible for security.
I know that it's the shitty coders fault for writing shitty code but a defensive approach to design is something we should strive for to increase our level of security.
Anons need not reply. Questions end with a question mark.