Flawed Online Tutorials Led To Vulnerabilities In Software (helpnetsecurity.com)
An anonymous reader quotes Help Net Security:
Researchers from several German universities have checked the PHP codebases of over 64,000 projects on GitHub, and found 117 vulnerabilities that they believe have been introduced through the use of code from popular but insufficiently reviewed tutorials. The researchers identified popular tutorials by inputting search terms such as "mysql tutorial", "php search form", "javascript echo user input", etc. into Google Search. The first five results for each query were then manually reviewed and evaluated for SQLi and XSS vulnerabilities by following the Open Web Application Security Project's Guidelines. This resulted in the discovery of 9 tutorials containing vulnerable code (6 with SQLi, 3 with XSS).
The researchers then checked for the code in GitHub repositories, and concluded that "there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities." Their paper is titled "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery."
The researchers then checked for the code in GitHub repositories, and concluded that "there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities." Their paper is titled "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery."
Researchers from several German universities have checked the PHP codebases of over 64,000 projects
The researchers identified popular tutorials by inputting search terms such as "mysql tutorial"
Ah, I see where they went wrong. They should have searched for "real mysql tutorial."
systemd is Roko's Basilisk.
First the Archive.org madness, and now this... It's just one thing after another. Idiocy upon idiocy... I cannot find the energy anymore to really care about any of it. I can't control it, nobody listens to my voice, and it doesn't *really* affect me as I code everything on my own or VERY carefully look through the "snippet" of code if it's something really specific yet complex that I need.
People learn not by memorizing but by looking at examples.
Most of the people working in Web-related jobs are not security experts, their job is to get things done as quickly and cheaply as possible. You might think in terms of huge corporations were IT is divided in groups, each working on specific parts of the whole. At the smaller scale though, the same person responsible for the front-end with HTML, CSS and Javascript has to work on the back-end with PHP and MySQL. The second their code does what it's supposed to do they just keep going to the next item to be completed.
So really, blame the examples and especially the tools. This is a similar problem to languages that allow you to point outside the bounds of an array or operating systems that can't protect themselves from applications gone rogue. Security should be part of the OS, part of the languages, part of the frameworks. Only then can you blame the coders at the bottom for security holes.
#DeleteFacebook
I believe it.
I've come across countless tutorials that cover things like capturing and using form field input, but almost NEVER see a single word in them about sanitizing data, or guarding against bad, malformed, or malicious data.
It's just, "Here's how ya get the data, now go jam it in the database or print it right to the screen!" Fuck me.
And in all fairness, as a PHP user, I've seen a *lot* of PHP tutorials that were bad, stupidly dangerous, or just plain wrong. One of the most egregious was a "tutorial" that showed sending the entire SQL statement to the server as a GET parameter. That's right, some guy actually coded his shot so that it sent a live SQL statement in the URL, and then blithely processed the attached variables without so much as a how-de-do.
Later I saw code that did this exact thing used in various scripts (guestbooks, registration forms, comment forms), probably based on this epically flawed "tutorial".
Just cruising through this digital world at 33 1/3 rpm...
I would love to hear the explanation of how a general purpose language would protect you against attacks like that, clearly called out in the article.
You're doing the snowflake thing, blaming everyone else for the coders' incompetence and unsuitability for the job. Some dweeb wrote a tutorial and because it's not ready to be cut and pasted into production code, that's the tutorial writer's fault.
NB: Not everyone can code.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
If a bridge collapses, do you blame the production workers who followed the plans exactly as they were or do you blame the engineer who was too lazy to make the proper calculations and didn't get the tests done for the bedrock foundation, etc?
You're doing the popular "snowflake attack" thing here, when in fact you're the snowflake thinking that everyone is as good as you are. The thing is, as I said, it's not everyone's job to be a security expert. We should expect security to be part of the tools instead of expecting everyone to be a security expert. The first is possible, the second will never be.
#DeleteFacebook
The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.
Remember the left-pad crisis that broke the Internet because a developer removed his npm packages over a dispute? How hard is to write a left-pad function?
And it's Google's fault for returning the tutorial on page one of the results. What, they're using an automated algorithm? So they need to adjust their algorithm.
The important takeaway here is not that flawed tutorials lead to bad code. It's the implication that one could actually poison tutorials intentionally, perhaps in some very subtle way. While it would be quite difficult to inject malware this way (unless the tutorial convinces some idiot to download this "include file that you need for this function"), it probably wouldn't be too difficult to inject, say, buffer overflows or XSS vulnerabilities that could well be invisible to novice programmers. Those vulnerabilities could then be exploited post-deployment, perhaps using a bot scan of Github to identify broken apps that include the code. Rust is better because for something on the order of a 10% overhead vs C, it effectively eliminates buffer overflows (unless something is amiss with Rust itself, in which case we have only one bug to fix, but millions of precompiled vulnerabilities in the field). On balance, Rust seems like a net positive to security. It does nothing much, however, to prevent vulnerabilities having nothing to do with memory exploits. For that matter, one could probably write Rust code to exploit Rowhammer. Or poison a tutorial to do that. It would be completely "safe" multithreaded code... that isn't, thanks to ubiquitous shitty DRAM. There's another, subtler issue: UTF8 hacks. One could post a tutorial and substitute various characters with various similar characters. Maybe, just maybe, one could find a way to get some dufus to copy the code into his source and create an exploit because he confuses one character with another one that looks almost the same (or, even worse, exactly the same due to text rendering shortcomings on his end). On the vigilante end, I suppose the only solution is to first of all identify the poisoned/flawed tutorials, and secondly to search Github or other repositories for key snippets. This is a hard problem to automate due to the zillions of ways that the tutorial code might be imported into a project and tweaked to fit, without destroying the vulnerability it injects. So, to the noobs out there: read tutorials, but, at most, copy code from them by retyping it yourself. DON'T DOWNLOAD INCLUDES OR "REQUIRED BINARIES". DON'T CUT AND PASTE CODE INTO YOUR PROJECT. Cross-verify with multiple sources (which could have been manufactured by the same hacker, so beware similar look-and-feel), and if you still don't really understand what you're doing, then do it some other way. Now, for the public generally, I wish there were a way for us to protect ourselves from this crap. I don't think there is, apart from avoiding software like the plague. It's not like the code you cut and paste from the tutorial is going to create some obvious malware signature in most cases, especially if the tutorial is very abstract in nature. After all, there are endless versions of compilers and compiler settings in use out there.
The internet as a whole is a tool for quick mass communication. Did people really think it would be impervious to stupidity and incompetence? Social media, news on social media.. Philosophers have long written about the effects of incorrect statements affecting humanity like ripples from a stone being thrown into a pond. They hadn't imagined a world where the pond is electric with ripples traveling outward at the speed of light.
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
Whoa! Blueprints for a bridge is a very bad analogy for tutorial code. Seldom do you see certified bridge engineers cutting a pasting features of a bridge design into another another. Concept may be applied but only after due diligence.
Similarly, unless the tutorial is on writing or securing code/services, it should only be considered an introduction to a topic or concept.
Tutorials do not write bad code. People write bad code.
A tutorial’s purpose is not to follow good practices in all aspects; such a tutorial would be unreadable. A good tutorial focuses on the one aspect being demonstrated, and, for the sake of exposition, intentionally neglects all other aspects, including, but not limited to, error handling, separation of responsibilities, access control, injection avoidance, naming, cache invalidation, etc.
The implicit assumption is always that you will apply good practices that you already know from elsewhere. You get that by (a) reading books (with wider scope than just a tutorial), (b) reading war stories in blogs (where neglecting a good practice led to a horrible vulnerability or a lengthy debugging session), and (c) participating in actual real-world projects collaborating with and learning from more experienced developers.
See subject & this post - Especially regarding "code sharing" (plagiarism) that backfires for security https://developers.slashdot.or... and yes, it takes away the "mental exercise" of doing it yourself that makes you stronger.
APK
P.S.=> It's not the greatest idea doing "opensores" (Chrome EFast is my evidence here) or "codesharing" per the above... apk
where supposedly anyone can 'code', this is the expected result. Why would you expect anything else from cheap inexperienced labor?
Hardly the tutorials fault. A tutorial will not cover every edge case of your specific example. It will not spend 90% of its length teaching about only partially related topics. I read the same SQL and HTML GET tutorials as everyone else, and a basic understanding of programming I learned in the first month of grade 9 prepared me for sanitizing the input. SQL, isn't special, and GET is no different than CIN, it's not rocket science.
Troll is not a replacement for I disagree.
That's the problem - In your analogy, you've hired "production workers" when you actually need engineers.
If you're hiring people dumb enough to copy and paste code into production, without understanding the ramifications, you deserve what you get.
How many slashes would a slashdot dot, if a slashdot could dot slashes?
So, I know how everyone feels. If something goes bad code wise, it goes bad for all of us, whether we update or not thanks to a thousand apps running the same single API. Open source used to destroy open source only to kill the desktop because they can't invent a new architecture fast enough to sell new computers. And, the new ones now aren't that much different, if not less powerful than the ones five years ago. So, the Google and Window$ come up with as many apps that need Internet to work as they can to lock you into their bs. And now, Mark Shuttleworth wants to focus on cloud computing for Ubuntu as well. They're killing the desktop and money is the only reason. More problems from bad code means more money to fix or replace computers. I like how they say things like open source to describe their server based software. Ok, but what good does that do for the average person? Do we all buy our own servers? Companies using open source to destroy privacy and control on desktops and AI to destroy encryption. That's the future.
While bad tutorials help make shitty coders, there will always be shitty coders. The question is then becomes, "how do we protect internet servers from shitty code?" The answer to this is with secure interfaces and we've failed at most levels.
Let's start at the top with web serving daemons. Web serving daemons (e.g. Apache) currently support script languages (e.g. PHP) which are a minefield of insecurity. The fact that they were happy to enable script language interpreters and execute them with the same level of privilege as the web serving daemon itself (by default at least) use without a second thought shows a lack of understanding about the dangers they hold.
The next level of insecurity is in the script language interpreters which are being invoked by the web serving daemons. Script language interpreters intended for use with web servers have only "recently" added the ability to restrict certain operations. However by default, even the most dangerous operations like the execution of text strings are enabled. The most egregious flaw I've seen is in PHP which allows ability to define the value of variables that are not explicitly requested. At no point was this a good idea.
Drilling down, we get to database daemons. Database daemons do not promote the use of a function call based interface but rather a text only interface. Frankly, anything goes with a text based interface which leaves it wide open to naughty inputs. A text interface is a wonderful concept for ease of use but it's just terrible for security.
I know that it's the shitty coders fault for writing shitty code but a defensive approach to design is something we should strive for to increase our level of security.
Anons need not reply. Questions end with a question mark.
Is the salary for online free tutorial writer in the same ballpark as bridge engineer? Someone should tell these open source not to release anything unless it's perfect, either.
I don't get it why some many developers WON'T use the real documentation. Heck, many of them WON'T even download from official sources, instead relying on third-party collections with obsolete versions, or, worse (at least potentially) intentionally hacked/poisoned mods.
WHY do so many use W3Fools? I once had a Google filter set-up to keep them out of search results. But W3Fools gamed Google with dozens or hundreds of of different domains, until the technique became widespread and Google threw in the towel and removed the filter feature. W3Fools is the WORST possible place to get accurate information. I half-suspect it is actually a Russian or Chinese initiative to spread absolute crap all over the Internet. Find out where W3Fools is blocked. That will tell you who is behind it! ;)
MDN is a GREAT site for learning HTML/CSS/JS. The jQuery Learning Center is a GREAT site for learning jQuery. Why do so many flock to tutorial sites with horrible quality and WRONG information?
I don't use PHP, so don't know if the official documentation is GREAT. I have to guess, though, that after all these years, it can't be totally awful.
(Last time I used PHP was like a year after it first appeared. I think I had it emailed to me by the author. I feel for the author, who is probably blamed by many for it's failings. It was just a simple script to help him with his blog site, and he was an amateur. I do not mean "amateur" in a disparaging way, I mean it in a descriptive, literal sense. Others took it up and built crap on top of a simple script with a simple purpose. Along the way, there's been a corrective course that turned it into a language with a not-completely-awful syntax, but the developers haven't had the will to remove the awful parts. It seems impossible to get PHP developers to stop cutting-and-pasting, and to stop using the awful parts.)
CODE DOES NOT BELONG IN HTML TEMPLATES. CODE DOES NOT BELONG IN HTML TEMPLATES! CODE DOES NOT...
Unfortunately, that's how MOST PHP sites are written.
Millennials, who refuse to do anything the traditional way. This won't be the only time in life it comes back to bite you on the buttocks. Ironic that your helicoptor parents did such a crappy job of teaching you anything given how omnipresent they were (are), huh?
I don't get it why some many developers WON'T use the real documentation
Because reading documentation is a skill many of them have not developed yet.
"First they came for the slanderers and i said nothing."
Had a room-mate learning to code. They got some code from their college course as
char inchar; .... }
while( (inchar = fgetc(ifile)) != -1) {
Seldom do you see certified bridge engineers cutting a pasting features of a bridge design into another another.
I'm not so sure about that. I worked a few years in construction with my father. He examined the blueprints for every project carefully to order his list of materials and occasionally finds minor mistakes that he need to correct on the ground. One day he found a three-foot wide mistake between two pages of the blueprint for the same wall. The architect called bullshit because the two pages line up perfectly with each other. So my father had the architect and main contractor walk the layout on the ground for the foundation before it got poured. Lo and behold, a three-foot wide gap was found that prevented the wall from lining up. That problem was in the blueprint and not how the foundation guys laid the layout, although the main contractor yelled at them for not catching the mistake soon.
I find it amazing that those that are the most efficient, cannot create worth a damn. Maybe a demonstration of bleeding edge software design in some field few even understand, like maybe human neural patterns? Of course, googling would only be a suggestion.
Decades old question: should programming become an engineering level profession?
And dogma will eat your homework.
No other explanation is needed. Why bother blaming bad tutorials?
Instead of actually being COMPETENT, we now have armies of phoney coders who know little about what they are doing because all they're really doing is pasting together lots of blobs of crappy code they found on the internet.
Any business employing these fake programmers deserves all the badness that comes later when their junk collapses and the fake programmers are unable to fix it because they never fully understood how it sortof worked since they never actually wrote most of it.
Before the internet, this particular problem did not really exist.
Hey, you kids, GIT OFFA MY LAWN! (or at least learn to actually write and understand your own code)
Ideally they should study the real documentation on the company's paid training time. Learn the thing from ground up. Solve problems using both knowledge and critical thinking.
In reality they're shouted down by the boss: "GET THIS WORKING RIGHT NOW I DON'T CARE FUCKING HOW YOU CODE MONKEY!!!!"
I've been teaching people about this for ages. I have reviewed perhaps a couple hundred recruitment tests as well. You would be shocked how many can't even indent and you see injections all the time. I sometimes perfect to see manual escaping using the provided functions than prepared queries because prepared hides the problem. I am pretty sure a lot of them use tutorials as they are doing the test and it makes me wonder.
When I am training juniors one of the first thing I get them to do is to learn to go straight to the authoritive manual and ignore the top results. I explain to them how SEO works and that these people are making money sometimes practically plagiarising the manual and then using SEO to get their ad laden version to the top. Originally it came from having for some languages bad manuals or really thick specifications (W3C never wrote decent manuals). They get a bit of traction being one site for multiple languages as well.
It's not just tutorials but also questions (common error messages, problems, etc). Stack overflow has done a really good job of cleaning things up. I still see junk in there rarely but you can at least add your piece. One of the worst cases I saw was when I searched for best MySQL practices and found a guide with some questionable things or poor explanations. The result of that is that I had one developer adding LIMIT 1 to the end of a hundred queries entirely needlessly. This practice added noise to code and led to obscuring possible errors (where if you don't get zero or one result the query is broken). It was never added in the kind of situation you would actually want to limit by one such as with an order by to get the top result. These practices are useless if someone doesn't actually understand what is going on and if they understand that it should be immediately obvious when and when not to limit. That was one of many dodgy things in there. I think the worst has to be where if you downloaded something precompiled for a certain platform you would get an error message about having the wrong library version (dynamic linker error). The guide explained how to hex edit it to look for a different version. Asides from the potential issues this can cause with compatibility and segfaults (security as well), the software was open source so could have been downloaded and recompiled.
The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.
Remember the left-pad crisis that broke the Internet because a developer removed his npm packages over a dispute? How hard is to write a left-pad function?
Sorry but now.
You should not be copy-pasting a left pad function.
But, you should not be re-implementing yet another one yourself neither.
Simple trivial task like this *should go into a standard library*.
On any machine on which I fire up a C compiler, I know that at least I can rely on a decent compliant standard library for simple task.
If I want to left-pad a number, I just give the appropriate parameter to printf.
(Well unless I'm writing kernel code, or unless I'm writing for an tiny embed platform where every byte counts and I need as specific code as possible).
Why does Npm needs to be any different ?
---
Also, left-padding function might be not as trivial as you think. Not every language is english, not every language is written only with the ASCII subset of unicode. Some weird corner cases will start to popup. Think situation where : Number of bytes IS NOT number of unicode code point which in turn IS NOT the number of displayed characters (e.g.: some of the unicode are diacritics or other such modifiers)
(To think about worste case scenarios: How do you even left-pad Zalgo ?)
But these are indeed extreme cases.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
We shouldn't be asking why people are copying bad code, we should be asking why they need to.
Not sure why you're rephrasing my statement.
There's a subtle difference.
Your statement clearly poses re-implementation of code as the main alternative to copy-pasting. (boils down to "You should intelligently re-implement, instead of blindy copy-pasting").
The above statement simply discourages from copy-pasting (boils down to "Do not copy-paste, why do you even want to ?") but is still open to *any* solution :
that includes having a standard library (which was another criticism back during the "#LeftPadGate" ) which is also a valid solution : if there a decent standard library, nobody will need to copy-paste anymore either (but nobody will neither need to re-implement).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I find the majority of online tutorials are very naive with regard to development best practice and stack-overflow is one of the worst offenders for perpetuation this problem. It is a vicious circle. Simplistic solutions get upvoted while better answers take longer to prepare and comprehended by fewer people.
It is common to see answers presented than ignore the bigger picture, for example, a poor separation of concerns and high levels of coupling are the norm.
If you call this out your answer will often be downvoted or challenged based on the evidence from simple tutorials or that some-one has been gaming stack-overflow. While evidence such as being a Portland Pattern repository at C2 is ignored our outright dismissed.
Other tech areas that exhibit the same problem is Cucumber BDD (first person pronouns are endemic anti-pattern) and Selenium WebDriver tutorials, using highly qualified locators and conflating representation with the test code.
When you need to continuously re-coach development best practices to teams as a automation consultant it can get very tiresome very quickly.
The copy-pasta "cause" is obvious, but the existence of the flaws and their propogation are symptomatic of a complex subject matter (ie domain, functionality, design, and/or API).
What? You mean if you teach someone to write vulnerable code, they then tend to write vulnerable code? Get outa here! Whodathunkit? I hope they didn't spend too much on this "study."
"Find out where W3Fools is blocked. That will tell you who is behind it!"
Well, since you admit blocking it....
And it's Google's fault for returning the tutorial on page one of the results. What, they're using an automated algorithm? So they need to adjust their algorithm.
Or ask for support from wherever they copied it.
I'm under the impression that JavaScript doesn't have a standard library outside of its core functionality, but it does have a ton of frameworks available.
Yup, that's exactly my (poorly worded) complain.
Tons of semi-usefull frameworks everywhere,
but not a basic library of standard functions.
Leading to either tons of copy-pasting, or relying on scattered external modules.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Instead of actually being COMPETENT, we now have armies of phoney coders who know little about what they are doing because all they're really doing is pasting together lots of blobs of crappy code they found on the internet.
Any business employing these fake programmers deserves all the badness that comes later when their junk collapses and the fake programmers are unable to fix it because they never fully understood how it sortof worked since they never actually wrote most of it.
Before the internet, this particular problem did not really exist.
Hey, you kids, GIT OFFA MY LAWN! (or at least learn to actually write and understand your own code)
Maybe if motherfuckers writing APIs would fucking document their shit like they did back in the days of "Real Programmers", that wouldn't be a problem.
It's the developer's fault for copy pasting. it's management's fault for under-hiring or hiring poorly, or not having a full time security analyst. it's Google's fault for not checking all online code tutorials for security bugs, and for that matter everybody's web site for the same bugs. It's the attacker's fault for being evil. Let's not leave out the media, president Whoever, the Russians, and ISIS. Actually it's all my fault, I'm dreaming all of this.