Python's Official Repository Included 10 'Malicious' Typo-Squatting Modules (bleepingcomputer.com)
An anonymous reader quotes BleepingComputer:
The Slovak National Security Office (NBU) has identified ten malicious Python libraries uploaded on PyPI -- Python Package Index -- the official third-party software repository for the Python programming language. NBU experts say attackers used a technique known as typosquatting to upload Python libraries with names similar to legitimate packages -- e.g.: "urlib" instead of "urllib." The PyPI repository does not perform any types of security checks or audits when developers upload new libraries to its index, so attackers had no difficulty in uploading the modules online.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
I use pip install all the time...well pip3 install
pypl is great but they could increase their security at bit and still keep the same level of functionality. This malware is kind of obvious, or at least it seems like it should be obvious to security people.
I remember thinking on more than a few occasions that pypl could be easily misused by beginners.
Thank you Dave Raggett
OMG. Trump is typosquatting? Impeach him IMMEDIATELY!!!!
What the hell would that change?
The vector here is people asking for a module that is named similar to the one they want, pip in installing exactly the module they are mistakenly asking for - there is no reason that any cryptographic signature would be failed.
The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.
Basically, if you are installing modules from 'dah internetz', you should take just a little care, perhaps?
Unless what they are trying to claim is that walled garden security is somehow better? I doubt they would get many within the python community to agree.
Russians HACKED into the ELECTIONS and DID THIS! oh nooooes!!!
-Beau
...Russia!
Turning Point USA is a right wing organization with chapters on many college campuses. They claim to support the principles of fiscal responsibility, free markets, and limited government. They also claim that the free speech of conservices is threatened and that they advovate for free speech.
In 2016, Turning Point USA created a professor watchlist that claims to be a list of faculty who promote radical leftist ideas in the classroom. Examining the list reveals a number of faculty members who have engaged in egregious conduct such as requiring their students to sign a pledge to vote for a particular candidate. Faculty should not be doing that in the classroom, and it shouldn't be controversial to remove faculty who attempt to force their political views on students, regardless of what those views are. However, many others are on the list for expressing their opinions through op-eds, Twitter posts, and television interviews. Turning Point USA appears to believe that faculty with viewpoint they disagree with should not have free speech outside the classroom.
Make no mistake, this is a blacklist, for the purpose of silencing the free speech of faculty outside of the classroom if that speech is contrary to the right-wing views of Turning Point USA. They support your free speech so long as you agree with their political views. If Turning Point USA had its way, public universities would dismiss faculty members for expressing their first amendment rights outside the classroom. This is neither freedom nor limited government.
It is every bit as troubling that some of their supporters calling for the tenure system to be abolished. The tenure system has its origins at universities that were affiliated with religious institutions. It sought to protect the academic freedom of professors to examine and conduct research on topics that would be opposed by the religious institution in control of the university. While the tenure system has its flaws, it was developed for the purpose of protecting the free speech and academic freedom of faculty. Abolishing tenure is an attack on free speech.
Tipping Point USA should have the freedom of speech to do all of these things, including running their blacklist of professors. I support their right to free speech, and I believe they should be able to speak their ideas on university campuses. However, I intend to use my free speech to expose them for the hypocritical and dangerous organization that they are. It's not their ideas of fiscal responsibility, free markets, and limited government that I find dangerous. I don't. In fact, I even agree with some of those ideas. However, those ideas are a facade to hide the true mission of Turning Point USA, which is to censor views on college campuses that they disagree with. Let's use our free speech to expose the true mission of Turning Point USA.
Seems that was some sort of CP/M program. AMirIght? Early CP/M. Maybe VAX? Or is this some South park thing?
Start thinking for yourselves, assholes.
Either write your own software, or quarantine the hell out of other people's software.
Get a real operating system, such as Linux, and pare that crap down to the bear minimum components that you need; actually read the source code for various critical aspects of your system. Write your own, tailored init programs instead of relying on third-party junk like SystemD.
If you cannot spare the time to audit the software that you are going to run, then at least run software in a jail; at the very least, run software under various accounts—actually make use of Unix permissions for once in your life. Figure out how to set up ACLs, use programming languages that do even a modicum of compile-time checking (unlike this toy language, Python).
Take an active role in your computing; not everything is on StackOverflow for you code monkeys to copy/paste.
scripty junk for math majors who can't handle real languages
The comment provides some really solid insight on the situation.
> but they could increase their security at bit [...]
As long as it's "they" and not "we", we'll have problems.
I've lived for six years in a big corp, and yes, they use Free Software (they call it "opensource" because of... ideology). Conceptually, the spirit was "it's the same as proprietary, less shiny but cheaper". As long as we don't understand the real asset of Free Software (that each of us is part of the damned process), we'll be unable to reap its full benefits.
I'm having less and less sympathy for those who get burnt when they do the equivalent of "curl foo | sudo bash" (be it directly, via PyPI or composer) without investing some thought in it. At least they should re-invest the discount they get on Free into learning and taking part in fixing the Commons.
pip, the Peripheral Interchange Program, was first deveoped by Digital Equipment Corporation. I don't know which of their architectures it was first developed for, but the PDP-10 version was ported from the PDP-6. It may date back to the PDP-1, but that is prehistory to me. Its claim to fame was that it could copy from any device to any other device. In an era when IBM mainframes had a program to copy from device N to device M, there would have to be O(k^2) where k was the cardinality of different devices (a bit less because you never had to copy from the card reader to the card reader). Back in 1968 I wrote a program that could read any of the several file formats on the disk to any other file format on the disk, or from the disk to the printer, or to or from punched cards, so the multiplicity of file formats on the disk (Sequential Access Method, Indexed Sequential Access Method, and Partitioned Access Method) more than made up for the reduction from unidirectional devices. It was a very challenging program to write. By comparison, pip, when we moved from an IBM mainframe to a Digital Equipment KA-10, was a wonderful tool.
This gave me an idea! I'll launch my own Python repository, called PyPl.
Step 1: Require that package names are treated as case insensitive.
Step 2: Require that all package names be at least 3 characters long.
Step 3: Require that the minimum edit distance between the names of any two packages be at least 1/3rd of the length of the longer name.
Now step 3 will be a problem for some. Lets suppose I develop the package "FooBar" and while it has become semi-popular that some issues need to be addressed that will break compatibility. "FooBar2" will fail step 3 here and some will not like that, but I argue that compatibility breaking *should* lead to an entirely new name, and no its not lost on me that Python is the poster child of keeping the same name while breaking compatibility. I dont understand why the developers havent apologized yet.
"His name was James Damore."
Watch out! The Humpty-Dumpty video is pretty addictive especially if you think of creimer while watching it.
What the hell would that change?
If anything remotely like the way it is handled in RPM repositories, at least the identity of the author is different.
urlib and urllib would be submitted by 2 different authors.
menaning that pypi would either "installing urllib, signed by 0xb00b1e5 'original@author.com' ? [Y/N]" or
"installing urllib, signed by 0xdeadbeef 'evil@hacker.com' ? [Y/N] "
(in a way, that is something that already is happening with GitHub repository as the author's nickname or the company's/project names are part of the URL)
it's not much, but if the user has missed a single letter in the name (has happened to me, pip refusing to install 'thony' as that one didn't exist, unlike 'thonny'),
maybe they are better at spotting a whole different author identity
(or maybe not. Maybe most python users are that much careless)
(with their mind so busy paying attention to blank spaces and tabs)
Also, I don't have a clear idea of the python community publishing modules on pypi (I'm more of a Perl guy than Python guy, I mostly dabbled into pypi while helping software deployment on my university's HPC) but if the most common non-core modules are developed by a few known authors (e.g.: key 0xb00b1e5 'original@author.com' has been trusted multiple time already and the user has added it to his whitelist because he needs a lot modules) then pip suddenly pausing to ask confirmation for a new unkown, non-whitelisted key (e.g.: key 0xdeadbeef 'evil@hacker.com' seen for the first time) is sure to suddenly stand out as a sore thumb.
(as currently happens with 3rd party RPM repositories, e.g.: SUSE's Open Build System).
Yet another way to use cryptography, would be to take notice from GPG's web of trust, or from PKI's root certificates :
we could also imagine authorities that sign several uploader's keys as trusted.
i.e.: one could imagine a group, called "Python Booster" who don't release modules themselves, but sign the keys of module that they consider trusted to be in a "Python Booster Module Collection". (and optionnally "pip install pbmc" launching a setup.py that installs the whole distribution).
(So if you need a module that is trusted by one of these "module collections" you subscribe to, you'd be a bit better covered).
In practice, that is already the end result of not installing random module with "pip" but to use the RPMs provided by your trusted distro, or by a trusted 3rd party repository.
The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.
In addition of the cryptographic solution,
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present, without a human reviewing the reason behind close names.
That won't prevent you from making a "LibreBla" fork of an "OpenBlah" project, but that would reduce the easy to confuse clones (you'd need to explain to a human operator that "bla2" is a maintained legacy fork of an older pre-API-change version of "bla".
Unlike the current mess on pypi (and on CPAN for that matter).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
maybe they are better at spotting a whole different author identity
Good luck with that, as email addresses and author usernames can also be typosquatted, and unless you have the resources of Facebook to bruteforce a hash, key IDs aren't going to be as memorable as "boobies" or "dead beef".
I'm more of a Perl guy than Python guy [...] but if the most common non-core modules are developed by a few known authors
Does CPAN have the same situation where "common non-core modules are developed by a few known authors"?
Yet another way to use cryptography, would be to take notice from GPG's web of trust
I imagine OpenPGP's web of trust would have two significant practical problems.
Small world isn't as small as some believe First, the small world problem wouldn't work if there isn't a critical mass of developers who fly internationally to conventions in order to make the web more dense. Or people born with an interpersonal skills disability (such as myself) or who live in a small or medium-size town with few or no other PyPI package developers would have trouble attending even a local key signing party. Transitivity of trust Just because you trust someone's identity doesn't mean you trust someone's ability to verify others' identity. This reflects itself as a low weight on edges of the web of trust not adjacent to you, amplifying the "Small world isn't as small as some believe" problem.or from PKI's root certificates
Members of the CA/Browser Forum PKI will happily sign a domain-validated certificate for a typosquatted domain.
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present
This raises an exception I found to Python's batteries-included philosophy: Levenshtein distance comes with one of Python's major competitors, but it's behind an third-party module in Python.
PyPi isn't the official repository of the Python project, is a useful adjunct site. It does hold lots of packages that aren't in the official repository. But it's no more the official Python repository then http://ftp.us.debian.org/debia... which also holds a lot of Python packages that are easy to install (on a Debian system).
I think we've pushed this "anyone can grow up to be president" thing too far.
Well...
Considering that setuptools comes with python now and will install from pypi by default if you don't specify a repo, it's at least the unofficial official adjunct repo which still makes the poor security a big deal.
Then explain why DEB, RPM, Maven, CPAN, etc infrastructures work just fine?
I honestly don't know how those work fine. How did the first Debian Maintainer on each continent travel to get his key signed by a Debian Developer, as the process requires?