Slashdot Mirror


Python's Official Repository Included 10 'Malicious' Typo-Squatting Modules (bleepingcomputer.com)

An anonymous reader quotes BleepingComputer: The Slovak National Security Office (NBU) has identified ten malicious Python libraries uploaded on PyPI -- Python Package Index -- the official third-party software repository for the Python programming language. NBU experts say attackers used a technique known as typosquatting to upload Python libraries with names similar to legitimate packages -- e.g.: "urlib" instead of "urllib." The PyPI repository does not perform any types of security checks or audits when developers upload new libraries to its index, so attackers had no difficulty in uploading the modules online.

Developers who mistyped the package name loaded the malicious libraries in their software's setup scripts. "These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code," NBU explained. Experts say the malicious code only collected information on infected hosts, such as name and version of the fake package, the username of the user who installed the package, and the user's computer hostname. Collected data, which looked like "Y:urllib-1.21.1 admin testmachine", was uploaded to a Chinese IP address. NBU officials contacted PyPI administrators last week who removed the packages before officials published a security advisory on Saturday."

The advisory lays some of the blame on Python's 'pip' tool, which executes arbitrary code during installations without requiring a cryptographic signature.

Ars Technica also reports that another team of researchers "was able to seed PyPI with more than 20 libraries that are part of the Python standard library," and that group now reports they've already received more than 7,400 pingbacks.

24 of 69 comments (clear)

  1. pip by globaljustin · · Score: 1

    I use pip install all the time...well pip3 install

    pypl is great but they could increase their security at bit and still keep the same level of functionality. This malware is kind of obvious, or at least it seems like it should be obvious to security people.

    I remember thinking on more than a few occasions that pypl could be easily misused by beginners.

    --
    Thank you Dave Raggett
    1. Re:pip by Z00L00K · · Score: 1

      And how would cryptographically signed even help?

      Anyone letting a package into a library site need to verify it before it can be downloaded.

      If you download stuff from an unofficial library then you are on your own. But most of the unofficial sites are friendly though, so don't be too scared.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:pip by lucm · · Score: 5, Funny

      And how would cryptographically signed even help?

      That way you can be sure that if you download malware, it's not tampered with.

      --
      lucm, indeed.
    3. Re:pip by lkcl · · Score: 1

      I use pip install all the time...well pip3 install

      pypl is great but they could increase their security at bit and still keep the same level of functionality.

      it's actually incredibly comprehensive and extremely involved. for a completely separate team, i'm just in the process of writing up the requirements (following software engineering practices) which cover exactly this scenario: you can read them here if you like (note: they're in development and undergoing review): http://lkcl.net/reports/wot/

      basically from that MASSIVE list - a whopping EIGHTEEN separate and distinct requirements and that's not even getting into implementation details - you should be getting that familiar sinking feeling that what you're asking for is simply... too much for the pypi team to handle on their own. to expect them to be able to do a full verification of each and absolutely every single one of the packages - in fact to even keep their *own website* secure from attack - is simply too much.

      what *would* work is if the pypi team told all uploaders that the entire pypi infrastructure is converting over to a secure web-of-trust: that it is now following standard best practices followed for decades now by absolutely every single distro. namely: that uploaders are required to engage in key-signing parties and to register in a web-of-trust; that uploaders must then digitally GPG-sign their packages; and that pypi will only authorise a package as being online in the pypi index when they have GPG-signed a SHA2 checksum of the complete and full listing of every single package available for download on the entire pypi site.

      new package uploaders would then also need to be "approved" - it would need to become impossible for just any arbitrary-named package to be uploaded, as their GPG key would need to be verified as being part of the web-of-trust. this would then stop dead in its tracks the exact sort of thing that's come up (but also provide the level of trust and reassurance in every single package which is completely missing right now).

      basically, pypi needs to follow the exact same standard practices as any GNU/Linux Distro, and, to be absolutely bluntly honest, anyone who downloads arbitrarily untrusted software (like they do with windows, and including people who use ubuntu and download arbitrary .deb files, bypassing the entire purpose of the GPG web-of-trust behind apt-get and aptitude), gets precisely and exactly what they deserve. yes i have had acquaintances who have blithely downloaded a trojan'ed .deb package because it happened to have the same name. no he didn't bother to check its provenance.

      so, justin, may i respectfully recommend that if peace of mind is important to you, and you also wish to not have to do a full audit of the source code that you're downloading, that you use a GNU/Linux Distro only, and STOP using pip and pypi? if you're using a mac or using windows, you could at least have a mirror-machine where you do (if it's debian) "apt-get install python-mysqldb" or "apt-get source python-mysqldb" and then copy that over?

      at least in that way you will save yourself some time but also you know that someone - somewhere has staked their public reputation and career on a very public declaration that they have at least done _some_ sort of checking on the source code that they have GPG-signed and uploaded into a distro's package repository. if it's too out-of-date or is just not included, *then* you can use pip or just grab the .tgz source archive for yourself, and do some sort of auditing.

    4. Re:pip by lkcl · · Score: 1

      And how would cryptographically signed even help?

      That way you can be sure that if you download malware, it's not tampered with.

      all it tells you is, the signature was valid. whilst it links the file *to* the signature, it doesn't tell you anything about the trustworthiness of the PERSON. for that, you need much much more than just a legitimate signature: you need a full web-of-trust and for the package uploaders to be involved in key-signing parties, where they've basically (collectively) staked their reputation on trusting the ACTUAL identity. this becomes incredibly hard to compromise when there are multiple people involved. nobody dares try to game such a system: it's a variant of the "prisoner's dilemma" except with a thousand or more people.

    5. Re:pip by arth1 · · Score: 1

      That way you can be sure that if you download malware, it's not tampered with.

      all it tells you is, the signature was valid

      Whoosh!

    6. Re:pip by tepples · · Score: 1

      A key-signing party will let you verify the identity of living in the same city who have attended the same key-signing party as you. How will it let you verify someone on another continent, especially when you have no way of verifying the trustworthiness of intermediate signers to verify other people?

    7. Re:pip by tepples · · Score: 1

      new package uploaders would then also need to be "approved" - it would need to become impossible for just any arbitrary-named package to be uploaded, as their GPG key would need to be verified as being part of the web-of-trust.

      Then how would a new developer enter the web of trust without traveling internationally to a key-signing party?

      if you're using a mac or using windows, you could at least have a mirror-machine where you do (if it's debian) "apt-get install python-mysqldb" or "apt-get source python-mysqldb" and then copy that over?

      Good luck with that when after having installed Debian for the first time on your mirror-machine, your mirror-machine can't connect to the network because its NIC is unsupported.

    8. Re:pip by david_thornley · · Score: 1

      There are people I would calmly trust my life to. Some of them, in my opinion, don't have really good judgment, and I'm not sure I want to trust their friends. That's the problem I see with webs of trust.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    9. Re:pip by tepples · · Score: 1

      How did the first Debian Maintainer in each country travel to get his key signed? DebConf hasn't been around long enough to have held one in each country. And even for those countries in which DebConf has been held, how did the first Debian Maintainer in each state or province travel to get his key signed?

      Is there a way to verify identity that doesn't involve spending months' minimum wage to travel hundreds of kilometers?

  2. cryptographic signature? by thesupraman · · Score: 1

    What the hell would that change?

    The vector here is people asking for a module that is named similar to the one they want, pip in installing exactly the module they are mistakenly asking for - there is no reason that any cryptographic signature would be failed.

    The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.

    Basically, if you are installing modules from 'dah internetz', you should take just a little care, perhaps?

    Unless what they are trying to claim is that walled garden security is somehow better? I doubt they would get many within the python community to agree.

    1. Re:cryptographic signature? by cdreimer · · Score: 2

      The thing is, typos happen.

      Sometimes it's not even that. Beautiful Soup is the module for parsing HTML and XML files. However, beautifulsoup (bs) is the legacy version and beautifulsoup4 (bs4) is the version that everyone should be using. It's very easy to install the former when you need the latter.

      https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup

    2. Re:cryptographic signature? by thesupraman · · Score: 1

      And how exactly would a cryptographic signature help that?
      It would just certify that you had actually installed the right cones typo squatter correctly....

      Unless there was a central controller of who was allowed to publish python extensions.

      Is that what you want? It's called a walled garden...

    3. Re:cryptographic signature? by flargleblarg · · Score: 1

      distinguish "rm -rf /var/tmp/install" from "rm -rf / var/tmp/install".

      Oh, come on. You know it's supposed to be
      rm -fr /var/tmp/install

  3. Re: Personal Responsibility by cdreimer · · Score: 1

    How does one quarantine an operating system?

    The most secured version of Windows is installed on a PC with no cables attached inside a locked room.

  4. One step further by BlackPignouf · · Score: 2

    This gave me an idea! I'll launch my own Python repository, called PyPl.

  5. Solution? by Rockoon · · Score: 1

    Step 1: Require that package names are treated as case insensitive.
    Step 2: Require that all package names be at least 3 characters long.
    Step 3: Require that the minimum edit distance between the names of any two packages be at least 1/3rd of the length of the longer name.

    Now step 3 will be a problem for some. Lets suppose I develop the package "FooBar" and while it has become semi-popular that some issues need to be addressed that will break compatibility. "FooBar2" will fail step 3 here and some will not like that, but I argue that compatibility breaking *should* lead to an entirely new name, and no its not lost on me that Python is the poster child of keeping the same name while breaking compatibility. I dont understand why the developers havent apologized yet.

    --
    "His name was James Damore."
    1. Re:Solution? by HiThere · · Score: 1

      Do you understand the difference between major and minor version numbers? Or realize that major version number changes frequently indicate breaking compatibility?

      I will grant you that Linux has (recently) dropped that tradition, but that was because the number of minor version changes has gotten too large. Very few pieces of software have that rationale. (It's also because Linus found large numbers of minor version changes esthetically unpleasant.)

      If you go back a bit further, the sub-minor version changes were also significant, in that the minor version number told you what features were available, and the sub-minor version number told you the patch level. My feeling is that this was a better system, and the only problem what that it should have been two hexadecimal digits rather than decimal digits. Sometimes there were enough patches that three digits were needed, which complicated things.

      OTOH, this is all based on memory, and that earlier change was so long ago that I've probably got some of the details wrong.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    2. Re:Solution? by Rockoon · · Score: 1

      Do you understand that we are talking about libraries, not operating system and applications?

      You go on and on and on when you dont even know whats being discussed.

      --
      "His name was James Damore."
  6. Author's signature by DrYak · · Score: 2

    What the hell would that change?

    If anything remotely like the way it is handled in RPM repositories, at least the identity of the author is different.
    urlib and urllib would be submitted by 2 different authors.
    menaning that pypi would either "installing urllib, signed by 0xb00b1e5 'original@author.com' ? [Y/N]" or
    "installing urllib, signed by 0xdeadbeef 'evil@hacker.com' ? [Y/N] "
    (in a way, that is something that already is happening with GitHub repository as the author's nickname or the company's/project names are part of the URL)

    it's not much, but if the user has missed a single letter in the name (has happened to me, pip refusing to install 'thony' as that one didn't exist, unlike 'thonny'),
    maybe they are better at spotting a whole different author identity
    (or maybe not. Maybe most python users are that much careless)
    (with their mind so busy paying attention to blank spaces and tabs)

    Also, I don't have a clear idea of the python community publishing modules on pypi (I'm more of a Perl guy than Python guy, I mostly dabbled into pypi while helping software deployment on my university's HPC) but if the most common non-core modules are developed by a few known authors (e.g.: key 0xb00b1e5 'original@author.com' has been trusted multiple time already and the user has added it to his whitelist because he needs a lot modules) then pip suddenly pausing to ask confirmation for a new unkown, non-whitelisted key (e.g.: key 0xdeadbeef 'evil@hacker.com' seen for the first time) is sure to suddenly stand out as a sore thumb.

    (as currently happens with 3rd party RPM repositories, e.g.: SUSE's Open Build System).

    Yet another way to use cryptography, would be to take notice from GPG's web of trust, or from PKI's root certificates :
    we could also imagine authorities that sign several uploader's keys as trusted.

    i.e.: one could imagine a group, called "Python Booster" who don't release modules themselves, but sign the keys of module that they consider trusted to be in a "Python Booster Module Collection". (and optionnally "pip install pbmc" launching a setup.py that installs the whole distribution).
    (So if you need a module that is trusted by one of these "module collections" you subscribe to, you'd be a bit better covered).

    In practice, that is already the end result of not installing random module with "pip" but to use the RPMs provided by your trusted distro, or by a trusted 3rd party repository.

    The only marginal finger-pointing possible here is at PyPl for allowing typo squating, however even that is marginal.

    In addition of the cryptographic solution,
    it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present, without a human reviewing the reason behind close names.

    That won't prevent you from making a "LibreBla" fork of an "OpenBlah" project, but that would reduce the easy to confuse clones (you'd need to explain to a human operator that "bla2" is a maintained legacy fork of an older pre-API-change version of "bla".
    Unlike the current mess on pypi (and on CPAN for that matter).

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Author's signature by david_thornley · · Score: 1

      If anything remotely like the way it is handled in RPM repositories, at least the identity of the author is different. urlib and urllib would be submitted by 2 different authors. menaning that pypi would either "installing urllib, signed by 0xb00b1e5 'original@author.com' ? [Y/N]" or "installing urllib, signed by 0xdeadbeef 'evil@hacker.com' ? [Y/N] "

      When I'm looking for a library, I typically don't know or care who the original author was. I just want the library to do something I want done.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  7. Good luck getting signed w/o other devs in town by tepples · · Score: 2

    maybe they are better at spotting a whole different author identity

    Good luck with that, as email addresses and author usernames can also be typosquatted, and unless you have the resources of Facebook to bruteforce a hash, key IDs aren't going to be as memorable as "boobies" or "dead beef".

    installing urllib, signed by 0xb00b1e5 'original@outlook.com'
    # vs.
    installing urlib, signed by 0xdeadbeef 'origina1@outlook.com'

    I'm more of a Perl guy than Python guy [...] but if the most common non-core modules are developed by a few known authors

    Does CPAN have the same situation where "common non-core modules are developed by a few known authors"?

    Yet another way to use cryptography, would be to take notice from GPG's web of trust

    I imagine OpenPGP's web of trust would have two significant practical problems.

    Small world isn't as small as some believe First, the small world problem wouldn't work if there isn't a critical mass of developers who fly internationally to conventions in order to make the web more dense. Or people born with an interpersonal skills disability (such as myself) or who live in a small or medium-size town with few or no other PyPI package developers would have trouble attending even a local key signing party. Transitivity of trust Just because you trust someone's identity doesn't mean you trust someone's ability to verify others' identity. This reflects itself as a low weight on edges of the web of trust not adjacent to you, amplifying the "Small world isn't as small as some believe" problem.

    or from PKI's root certificates

    Members of the CA/Browser Forum PKI will happily sign a domain-validated certificate for a typosquatted domain.

    it could also be useful that pypi.org refuse to automatically open new modules repositories for modules whose name isn't beyond a certain levenstein distance of other name present

    This raises an exception I found to Python's batteries-included philosophy: Levenshtein distance comes with one of Python's major competitors, but it's behind an third-party module in Python.

  8. "Official Repository"? No. by HiThere · · Score: 1

    PyPi isn't the official repository of the Python project, is a useful adjunct site. It does hold lots of packages that aren't in the official repository. But it's no more the official Python repository then http://ftp.us.debian.org/debia... which also holds a lot of Python packages that are easy to install (on a Debian system).

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  9. How is key signing organized? by tepples · · Score: 1

    Then explain why DEB, RPM, Maven, CPAN, etc infrastructures work just fine?

    I honestly don't know how those work fine. How did the first Debian Maintainer on each continent travel to get his key signed by a Debian Developer, as the process requires?