Python's Official Repository Included 10 'Malicious' Typo-Squatting Modules (bleepingcomputer.com)
An anonymous reader quotes BleepingComputer:
The Slovak National Security Office (NBU) has identified ten malicious Python libraries uploaded on PyPI -- Python Package Index -- the official third-party software repository for the Python programming language. NBU experts say attackers used a technique known as typosquatting to upload Python libraries with names similar to legitimate packages -- e.g.: "urlib" instead of "urllib." The PyPI repository does not perform any types of security checks or audits when developers upload new libraries to its index, so attackers had no difficulty in uploading the modules online.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."
The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.
Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.
And how would cryptographically signed even help?
That way you can be sure that if you download malware, it's not tampered with.
lucm, indeed.
The thing is, typos happen.
Sometimes it's not even that. Beautiful Soup is the module for parsing HTML and XML files. However, beautifulsoup (bs) is the legacy version and beautifulsoup4 (bs4) is the version that everyone should be using. It's very easy to install the former when you need the latter.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup
This gave me an idea! I'll launch my own Python repository, called PyPl.
What the hell would that change?
If anything remotely like the way it is handled in RPM repositories, at least the identity of the author is different.
urlib and urllib would be submitted by 2 different authors.
menaning that pypi would either "installing urllib, signed by 0xb00b1e5 'original@author.com' ? [Y/N]" or
"installing urllib, signed by 0xdeadbeef 'evil@hacker.com' ? [Y/N] "
(in a way, that is something that already is happening with GitHub repository as the author's nickname or the company's/project names are part of the URL)
it's not much, but if the user has missed a single letter in the name (has happened to me, pip refusing to install 'thony' as that one didn't exist, unlike 'thonny'),
maybe they are better at spotting a whole different author identity
(or maybe not. Maybe most python users are that much careless)
(with their mind so busy paying attention to blank spaces and tabs)
Also, I don't have a clear idea of the python community publishing modules on pypi (I'm more of a Perl guy than Python guy, I mostly dabbled into pypi while helping software deployment on my university's HPC) but if the most common non-core modules are developed by a few known authors (e.g.: key 0xb00b1e5 'original@author.com' has been trusted multiple time already and the user has added it to his whitelist because he needs a lot modules) then pip suddenly pausing to ask confirmation for a new unkown, non-whitelisted key (e.g.: key 0xdeadbeef 'evil@hacker.com' seen for the first time) is sure to suddenly stand out as a sore thumb.
(as currently happens with 3rd party RPM repositories, e.g.: SUSE's Open Build System).
Yet another way to use cryptography, would be to take notice from GPG's web of trust, or from PKI's root certificates :
we could also imagine authorities that sign several uploader's keys as trusted.
i.e.: one could imagine a group, called "Python Booster" who don't release modules themselves, but sign the keys of module that they consider trusted to be in a "Python Booster Module Collection". (and optionnally "pip install pbmc" launching a setup.py that installs the whole distribution).
(So if you need a module that is trusted by one of these "module collections" you subscribe to, you'd be a bit better covered).
In practice, that is already the end result of not installing random module with "pip" but to use the RPMs provided by your trusted distro, or by a trusted 3rd party repository.
The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.
In addition of the cryptographic solution,
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present, without a human reviewing the reason behind close names.
That won't prevent you from making a "LibreBla" fork of an "OpenBlah" project, but that would reduce the easy to confuse clones (you'd need to explain to a human operator that "bla2" is a maintained legacy fork of an older pre-API-change version of "bla".
Unlike the current mess on pypi (and on CPAN for that matter).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
maybe they are better at spotting a whole different author identity
Good luck with that, as email addresses and author usernames can also be typosquatted, and unless you have the resources of Facebook to bruteforce a hash, key IDs aren't going to be as memorable as "boobies" or "dead beef".
I'm more of a Perl guy than Python guy [...] but if the most common non-core modules are developed by a few known authors
Does CPAN have the same situation where "common non-core modules are developed by a few known authors"?
Yet another way to use cryptography, would be to take notice from GPG's web of trust
I imagine OpenPGP's web of trust would have two significant practical problems.
Small world isn't as small as some believe First, the small world problem wouldn't work if there isn't a critical mass of developers who fly internationally to conventions in order to make the web more dense. Or people born with an interpersonal skills disability (such as myself) or who live in a small or medium-size town with few or no other PyPI package developers would have trouble attending even a local key signing party. Transitivity of trust Just because you trust someone's identity doesn't mean you trust someone's ability to verify others' identity. This reflects itself as a low weight on edges of the web of trust not adjacent to you, amplifying the "Small world isn't as small as some believe" problem.or from PKI's root certificates
Members of the CA/Browser Forum PKI will happily sign a domain-validated certificate for a typosquatted domain.
it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present
This raises an exception I found to Python's batteries-included philosophy: Levenshtein distance comes with one of Python's major competitors, but it's behind an third-party module in Python.