Python's Official Repository Included 10 'Malicious' Typo-Squatting Modules (bleepingcomputer.com)
An anonymous reader quotes BleepingComputer:
The Slovak National Security Office (NBU) has identified ten malicious Python libraries uploaded on PyPI -- Python Package Index -- the official third-party software repository for the Python programming language. NBU experts say attackers used a technique known as typosquatting to upload Python libraries with names similar to legitimate packages -- e.g.: "urlib" instead of "urllib." The PyPI repository does not perform any types of security checks or audits when developers upload new libraries to its index, so attackers had no difficulty in uploading the modules online.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
I use pip install all the time...well pip3 install
pypl is great but they could increase their security at bit and still keep the same level of functionality. This malware is kind of obvious, or at least it seems like it should be obvious to security people.
I remember thinking on more than a few occasions that pypl could be easily misused by beginners.
Thank you Dave Raggett
OMG. Trump is typosquatting? Impeach him IMMEDIATELY!!!!
What the hell would that change?
The vector here is people asking for a module that is named similar to the one they want, pip in installing exactly the module they are mistakenly asking for - there is no reason that any cryptographic signature would be failed.
The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.
Basically, if you are installing modules from 'dah internetz', you should take just a little care, perhaps?
Unless what they are trying to claim is that walled garden security is somehow better? I doubt they would get many within the python community to agree.
Russians HACKED into the ELECTIONS and DID THIS! oh nooooes!!!
-Beau
If you disagree, you're a Putin loving white nationalist!
If you disagree, you're a sexist racist xenophobe!
Seems that was some sort of CP/M program. AMirIght? Early CP/M. Maybe VAX? Or is this some South park thing?
Start thinking for yourselves, assholes.
Either write your own software, or quarantine the hell out of other people's software.
scripty junk for math majors who can't handle real languages
The comment provides some really solid insight on the situation.
In your opinion, everyone should read the source code to the critical parts of their operating system and other software. This is ridiculously impractical and source code audits will be more effective when performed by expert security researchers. While I do think people ought to be more informed about what's running on their computers and the reputation and security practices of the organizations providing that software, it's a far cry from everyone auditing the source code of the software running on their computers.
Take WordPress plugins, for example. It probably isn't necessary for everyone using the plugins to audit the source code of those plugins. Rather, let experts in auditing source code do it, and take the time to research the reputation of the plugin before using it. It isn't necessary for everyone to audit source code when a bit of common sense will suffice.
> but they could increase their security at bit [...]
As long as it's "they" and not "we", we'll have problems.
I've lived for six years in a big corp, and yes, they use Free Software (they call it "opensource" because of... ideology). Conceptually, the spirit was "it's the same as proprietary, less shiny but cheaper". As long as we don't understand the real asset of Free Software (that each of us is part of the damned process), we'll be unable to reap its full benefits.
I'm having less and less sympathy for those who get burnt when they do the equivalent of "curl foo | sudo bash" (be it directly, via PyPI or composer) without investing some thought in it. At least they should re-invest the discount they get on Free into learning and taking part in fixing the Commons.
pip, the Peripheral Interchange Program, was first deveoped by Digital Equipment Corporation. I don't know which of their architectures it was first developed for, but the PDP-10 version was ported from the PDP-6. It may date back to the PDP-1, but that is prehistory to me. Its claim to fame was that it could copy from any device to any other device. In an era when IBM mainframes had a program to copy from device N to device M, there would have to be O(k^2) where k was the cardinality of different devices (a bit less because you never had to copy from the card reader to the card reader). Back in 1968 I wrote a program that could read any of the several file formats on the disk to any other file format on the disk, or from the disk to the printer, or to or from punched cards, so the multiplicity of file formats on the disk (Sequential Access Method, Indexed Sequential Access Method, and Partitioned Access Method) more than made up for the reduction from unidirectional devices. It was a very challenging program to write. By comparison, pip, when we moved from an IBM mainframe to a Digital Equipment KA-10, was a wonderful tool.
This gave me an idea! I'll launch my own Python repository, called PyPl.
Step 1: Require that package names are treated as case insensitive.
Step 2: Require that all package names be at least 3 characters long.
Step 3: Require that the minimum edit distance between the names of any two packages be at least 1/3rd of the length of the longer name.
Now step 3 will be a problem for some. Lets suppose I develop the package "FooBar" and while it has become semi-popular that some issues need to be addressed that will break compatibility. "FooBar2" will fail step 3 here and some will not like that, but I argue that compatibility breaking *should* lead to an entirely new name, and no its not lost on me that Python is the poster child of keeping the same name while breaking compatibility. I dont understand why the developers havent apologized yet.
"His name was James Damore."
Watch out! The Humpty-Dumpty video is pretty addictive especially if you think of creimer while watching it.
What the hell would that change?
If anything remotely like the way it is handled in RPM repositories, at least the identity of the author is different.
urlib and urllib would be submitted by 2 different authors.
menaning that pypi would either "installing urllib, signed by 0xb00b1e5 'original@author.com' ? [Y/N]" or
"installing urllib, signed by 0xdeadbeef 'evil@hacker.com' ? [Y/N] "
(in a way, that is something that already is happening with GitHub repository as the author's nickname or the company's/project names are part of the URL)
it's not much, but if the user has missed a single letter in the name (has happened to me, pip refusing to install 'thony' as that one didn't exist, unlike 'thonny'),
maybe they are better at spotting a whole different author identity
(or maybe not. Maybe most python users are that much careless)
(with their mind so busy paying attention to blank spaces and tabs)
Also, I don't have a clear idea of the python community publishing modules on pypi (I'm more of a Perl guy than Python guy, I mostly dabbled into pypi while helping software deployment on my university's HPC) but if the most common non-core modules are developed by a few known authors (e.g.: key 0xb00b1e5 'original@author.com' has been trusted multiple time already and the user has added it to his whitelist because he needs a lot modules) then pip suddenly pausing to ask confirmation for a new unkown, non-whitelisted key (e.g.: key 0xdeadbeef 'evil@hacker.com' seen for the first time) is sure to suddenly stand out as a sore thumb.
(as currently happens with 3rd party RPM repositories, e.g.: SUSE's Open Build System).
Yet another way to use cryptography, would be to take notice from GPG's web of trust, or from PKI's root certificates :
we could also imagine authorities that sign several uploader's keys as trusted.
i.e.: one could imagine a group, called "Python Booster" who don't release modules themselves, but sign the keys of module that they consider trusted to be in a "Python Booster Module Collection". (and optionnally "pip install pbmc" launching a setup.py that installs the whole distribution).
(So if you need a module that is trusted by one of these "module collections" you subscribe to, you'd be a bit better covered).
In practice, that is already the end result of not installing random module with "pip" but to use the RPMs provided by your trusted distro, or by a trusted 3rd party repository.
The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.
In addition of the cryptographic solution,
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present, without a human reviewing the reason behind close names.
That won't prevent you from making a "LibreBla" fork of an "OpenBlah" project, but that would reduce the easy to confuse clones (you'd need to explain to a human operator that "bla2" is a maintained legacy fork of an older pre-API-change version of "bla".
Unlike the current mess on pypi (and on CPAN for that matter).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Cut him a break, hey?
The guy is tired after building his own refrigerator, dishwasher, air conditioning and car from trade spares.
CAPTCHA: remodels.
maybe they are better at spotting a whole different author identity
Good luck with that, as email addresses and author usernames can also be typosquatted, and unless you have the resources of Facebook to bruteforce a hash, key IDs aren't going to be as memorable as "boobies" or "dead beef".
I'm more of a Perl guy than Python guy [...] but if the most common non-core modules are developed by a few known authors
Does CPAN have the same situation where "common non-core modules are developed by a few known authors"?
Yet another way to use cryptography, would be to take notice from GPG's web of trust
I imagine OpenPGP's web of trust would have two significant practical problems.
Small world isn't as small as some believe First, the small world problem wouldn't work if there isn't a critical mass of developers who fly internationally to conventions in order to make the web more dense. Or people born with an interpersonal skills disability (such as myself) or who live in a small or medium-size town with few or no other PyPI package developers would have trouble attending even a local key signing party. Transitivity of trust Just because you trust someone's identity doesn't mean you trust someone's ability to verify others' identity. This reflects itself as a low weight on edges of the web of trust not adjacent to you, amplifying the "Small world isn't as small as some believe" problem.or from PKI's root certificates
Members of the CA/Browser Forum PKI will happily sign a domain-validated certificate for a typosquatted domain.
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present
This raises an exception I found to Python's batteries-included philosophy: Levenshtein distance comes with one of Python's major competitors, but it's behind an third-party module in Python.
PyPi isn't the official repository of the Python project, is a useful adjunct site. It does hold lots of packages that aren't in the official repository. But it's no more the official Python repository then http://ftp.us.debian.org/debia... which also holds a lot of Python packages that are easy to install (on a Debian system).
I think we've pushed this "anyone can grow up to be president" thing too far.
Shut the fuck up Trump supporter!!!
Well...
Considering that setuptools comes with python now and will install from pypi by default if you don't specify a repo, it's at least the unofficial official adjunct repo which still makes the poor security a big deal.
Then explain why DEB, RPM, Maven, CPAN, etc infrastructures work just fine?
I honestly don't know how those work fine. How did the first Debian Maintainer on each continent travel to get his key signed by a Debian Developer, as the process requires?