GitHub Accidentally Exposes Some Plaintext Passwords In Its Internal Logs (zdnet.com)
GitHub has sent an email to some of its 27 million users alerting them of a bug that exposed some user passwords in plaintext. "During the course of regular auditing, GitHub discovered that a recently introduced bug exposed a small number of users' passwords to our internal logging system," said the email. "We have corrected this, but you'll need to reset your password to regain access to your account." ZDNet reports: The email said that a handful of GitHub staff could have seen those passwords -- and that it's "unlikely" that any GitHub staff accessed the site's internal logs. It's unclear exactly how this bug occurred. GitHub's explanation was that it stores user passwords with bcrypt, a stronger password hashing algorithm, but that the bug "resulted in our secure internal logs recording plaintext user passwords when users initiated a password reset." "Rest assured, these passwords were not accessible to the public or other GitHub users at any time," the email said. GitHub said it "has not been hacked or compromised in any way."
How can a clear text password be available to them at all to record it in a log?
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
the bug "resulted in our secure internal logs recording plaintext user passwords when users initiated a password reset."
"We have corrected this, but you'll need to reset your password to regain access to your account."
Er... are you really sure that this has been corrected?
Ask me about repetitive DNA
I guess http basic auth over TLS.
The connection is encrypted using TLS but the password is transferred in the clear (base64).
Don't use basic-auth.
If you absolutely have to, use an application specific password with restricted rights.
you feed the string and the salt into an encryption algorythm like sha512 which produces a HASH this is what gets stored
Argh!
No!
NO!!!
NO-NO-NO-NO!!!!
DO NOT USE HASHES ! (like Sha512).
These are designed to be *fast* (1), meaning that it could be not impossible for an attacker to guess the password out of the hash simply by brute forcing all the most common password and variations thereof into the same salt and see if they match.
(1 - And remember that the "tera hash" that ASIC bitcoinminer are reporting are exactly that : trillion of SHA256-like computation per second.)
USE KEY-DERIVATION FUNCTIONS (KDF) INSTEAD !
Like the Bcrypt use by github as mentionned in the summary. Or Scrypt (same used by tarsnap). Or Argon2. etc.
These also produce a value out of a password and a salt, but they are on purpose extremely slow (E.g.: by repeating a hash function over and over for a high number of iteration).
If each computation takes some time, it doesn't impact login that much (After all, you only need to log in once at the beginning of your session), but it hinders anyone wanting to brute force your password out of a stolen hash.
It makes data breaches that managed to steal your user database a lot less dangerous (because once you have successfully guessed the password from the hash, the next step is to see all the other places where the user has re-used the same password).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
3) you feed the string and the salt into an encryption algorythm like sha512 which produces a HASH this is what gets stored
Except sha512 is a hashing algorythm not an ecryption algorythm.
Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.
So how is this "random salt" recovered when you need to check the password's validity?
It's stored along in the data base.
Most stored password have a form like :
${type of algorithm used}${parameters used}${data}
where:
- "type of the algorithm used" tell you what was used to generate this (e.g: using Bcrypt, like GitHub as mentioned in the summary).
- "data" is the actual salted-output that you need to replicate to successfully log-in
- "parameters" is any extra-data that the algorithm needs to generate password checks.
Like the salt.
Or like the number of iterations. Because nobody sane actually use a hash function such as SHA512 anymore. Instead you use a Key Derivation Function (KDF) such as Bcrypt (or Scrypt or Argon2) and those are *slow* on purpose, to make brute-forcing much less likely (e.g.: they slow down by repeating a hash for large number of iterations).
The exact implementation vary (the above is typically used by the "crypt" function used, e.g., on Linux log-ins),
but basically are the same : the salt (and iterations) are stored together with the "hash" that you need to test.
And most of the KDF function can work as "hash_to_compare = KDF(password_login_attempt, old_hash_from_database)", ie.: they can automatically extract the parameters if you give them the string that is in the database, and generate the hash the exact same way.
They'll invent a new salt (and guess the optimal number of iterations) only if you omit the old hash and give the new password as the single parameter.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The salt isn't secret, it's just used to prevent rainbow tables from being useful. If you store passwords as unsalted hashes, then an attacker can construct a large table of all of the hashes of 8-character inputs and compare each of your hashes against their table. If there's a match, then they have a password that will work. If you add a salt, then they can't use such a table, because they have to check each 8-character sequence with the hash prepended. If the salt is different for each password (as it should be), then there's no benefit from pre-calculating the table. If it takes 2 hours of GPU time to compute the table, then with unsalted passwords that's a one-time cost and you can then crack any weak password in a leaked password database almost instantly. In contrast, it will take you 2 hours to attempt each password and crack each weak one in a leaked salted password database.
I am TheRaven on Soylent News
OK, that's naughty and needs fixing, but it's internal logs, did it need a slashdot story?
Basic auth is an HTTP header, and HTTP headers are just as protected by TLS as response headers and bodies. Otherwise, HTTPS would be ineffective against Firesheep-style attacks that clone a session cookie. The other common means of authentication is submitting a password that has been entered into a field of an HTML form as part of an HTTP POST request body. What's any more "in the clear" with HTTP basic authentication than with the form route?
And in case you believe both forms and basic authentication ought to be replaced, what other means would you prefer? I can think of three, each with serious drawbacks:
HTTP Digest authentication This does hashing using a random initialization vector. However, it requires the server to store the password rather than only an irreversible hash for verification. Some zero-knowledge proof means Because this is not built into the HTML5 standard, it requires running script in the browser. Though web browsers by default run all scripts, many users change this for security and data cap reasons. Extensions exist to restrict script execution to a domain whitelist (JavaScript Switcher), a fine-grained whitelist (NoScript), or only those scripts whose source code is machine-readably available to the public under a free software license (LibreJS). Some go so far as to regularly browse the web with all scripts turned off. Client certificates TLS supports the use of a client certificate that identifies a user, which is exactly analogous to key-based authentication in SSH. However, browser publishers have thus far given no significant attention to usability of common use cases, such as choosing the right client certificate for a particular origin, synchronizing client certificates across devices that a user uses, or even something as simple as logging out.A better question is why doesn't the HTML standard for password fields allow automatic hashing with a custom salt?
It does; it's called digest authentication. But depending on how digest authentication is implemented, it is vulnerable to one of two attacks. If the realm portion is fixed, digest is vulnerable to a replay attack that passes the hash. If the realm portion is variable, it requires the server to store the unhashed password. In addition, digest authentication still uses MD5, which is deprecated and whose immediate successor (SHA-1) is also deprecated.
ntr
Is 27 million a small number?..
In Soviet Washington the swamp drains you.
Logging a password is a beginner's mistake, like SQL injection. I found the same bug in unreleased code many years back, and raised it to management so we could track down the engineer who did it. It's the kind of (cough) mistake that can be the "straw that broke the camel's back" when dealing with an engineer who has (cough) "negative productivity."
Ideally, this kind of bug should be caught in code reviews. As someone who reviews a lot of code, even I'll admit that it's possible for something like this to slip through.
No, I will not work for your startup
I got the email.
I was impressed that it was handled quickly.
I'm even more confident because I actually use a proper password manager making sure I have unique passwd's for everything.
I give them credit -- here they found their own security issue before it became a breach, they fixed it, and they didn't sweep it under the rug but instead they notified their users. Kudos for being forthcoming.
PBKDF2 uses SHA-variants in it iteration.
Despite "Shattered", it's not "broken" yet.
There are just better more modern KDFs (like the Bcrypt used by Github, like the Scrypt designed for use in tarsnap, or like Argon2 which is the latest competition winner) that don't have PBKDF's short comings (e.g.: collision of long input pass phrases and their SHA-1).
Regarding : "Shattered" you have to understand its context.
SHA-1 has known to be not as secure as it could be (a 128bit SHA-1 has not 128bits of security) for quite some time.
(The main reason why SHA-2 was developed and is now widely used in cryptography, and a partial reason why SHA-3 got recently developed-though-competition (the other reasons being that SHA-3 / Keccak also introduce some novel interesting concepts) ).
Because of this it was widely speculated that collision could be found.
A team of security research spent massive resource (lots of computation time) to search for collision (not brute forcing the whole 128bits space of sha-1 - which would be hard in any reasonable time -, but cleverly exploiting the above known limitation and vulnerability of sha-1).
After spending a considerable amount of time they managed to create two different blocs of (complete non-sense random) data that happen to hash to the exact same value.
It's not that they can generate collision at a whim, they can generate collision at a tremendous computational cost (but still an achievable cost - unlike the whole 128bits search space), and thus far managed to generate exactly 1 such collision.
Also due to the block-iterative way SHA (And most other pre-SHA-3 hashes) operate, it means you can stick this block in a file in a specific way, and get the same hash as if you stuck the collision in the other wise same file.
That limits severly the possible uses of this collision. You need a situation where you can store arbitrary noisy binary data, and have a program that can react to the presence of one or the other piece of data.
Currently, the only successful demo of Shattered is in a PDF file, because PDF can store arbitrary blobs (e.g.: used to storing bitmap data for illustrations, fonts, etc.) and the PostScript language used in PDF is Turing-Complete (some people are even writing ray-tracers written in post-script).
So you can craft a special PDF that hashes to the same SHA-1 sum, but whose PostScript will generated two different document, depending on which of the two collision block is stored in the blobs.
It's pretty limited in practical use.
In PBKDF, it means that you can have two long passphrases, that will generate the same SHA-1 on the first round of PBKFD2 (so you have a tripple collision : both long passphrase containing the 2 blocks of Shattered, and their SHA-1 sum)
But the exploitability of such a solution is quite limited (complex scenarios like an oracle giving passwords, and Eve secretely colluding with the oracle, so the oracles gives two provably different password to Alice and Eve (e.g.: if they compare the SHA256 or SHA3 of the passwords, they are different), but Eve can use her password to unlock Alice's stuff. And vice versa).
So :
TL;DR: Shattered isn't affecting PBKDF2 directly that much, but people have moved to more modern KDFs anyway, because they are better.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Don't tell people not to use hashes. The next think they'll think is "Oh, I should use plaintext instead".
A key-derivation function is also insufficient, since the output is only as strong as the input. Meaning if you have a 10-bit password, the resulting KDF strength will still only be 10 bits.
You must use a Password-Based Key Derivation Function. A PBKDF can add ~10 bits of security to a password. So if a user gives "password" as their password (2 bits), the resulting hash has ~12 bits of security.
Wonder what the public key field is for?
IIUC, what happened is that the passwords that were revealed were of those who were resetting their account passwords. So this would mean that they needed to be send the new password in clear text to the site, at which point it was logged before being hashed.
This is still a problem, but I'm not sure that there is any reasonable way around it. It just shouldn't have been logged.
I think we've pushed this "anyone can grow up to be president" thing too far.
So how is this "random salt" recovered when you need to check the password's validity?
In addition to what others have said about each password having a random salt stored alongside, you can combine the salt with another string that's stored in a secure hardware module. This string is the same for all passwords, but it can only be accessed by the application. This makes determining passwords even more difficult for someone who gets a copy of the database, since they don't have the entire string that was passed to the hashing algorithm.
Because computer scientists think they're funny, they call the secure string a "pepper".
Comment removed based on user account deletion
I don't really see why people are so against to hashes that they need to shout.
My main reason was for commically over-exagerated "hysteria".
The actual reason why people are against hashes, is a combination of three factors :
So guessing passwords out of (fast) hashes is completely doable for anyone with a little bit of ressource (paying a tiny sum to rent GPUs on the Cloud).
Just have a look at http://haveibeenpwned.com/ . Very often (though not always), attacker manage to get the password hashes. If you've been using a fast hashing function like SHA, guessing a significant proportion of the passwords is largely possible (like the point 1. above) at the cost of some GPU cloud-renting.
we human are stupid and tend to reuse passwords. Once you managed to successfully guess a password from point 2, you can try to see if it unlocks the e-mail account associated with the account in the database, or any other account you can find online associate with the same email and/or username and/or real identity (depending on what the leaked db provides to you).
That last one gives you tons of social engineering and identity theft/impersonation possibility to "profit!!!" from. So you can guess it is something that could happen in the wild.
---
(*) -- (when asked to follow password rules, humans will generally put the capital letter at the beginning, use 5-to-6 letters, then put 2-to-4 numbers, and the special at the end, most of the time it will be "!". The number of combination that follow this rules is vastly smaller than what "[A-Za-z0-9_!#@-]{8,16}" would imply)
Yes, bcrypt and similar are better and should be used. But I'd consider a hash, if properly used, still reasonably secure.
The vast difference is that bcrypt, scrypt and argon2 are on purpose designed to slow down bruteforcing and make FPGA and ASICs difficult (by using lots of iterations, and by requiring lots of memory)
The point 1. from the list above doesn't hold true anymore, so if the KDF's hash get laked in point 2. you can't gain much from them.
By properly used I mean hash(hash(password + salt) + salt), where + stands for concatenation. Even better if it has some concatenated pepper, too.
You don't even need to remember that formula if you remember the letters "hmac"...
For a typical /. geek who : /dev/random+base64 (good luck using patterns or common password lists on that !)
- generated purely random string from
- and uses 1 different password for each typical site (no password reuse)
( - and uses a secure password manager to keep them organised)
- and has activated 2-factors-auth (like Google Auth) on each website that supports it (so even if a password is somehow guessed correctly by shear luck, it's not useful on its own).
Yup, salted hashes are good enough.
For the rest of normal the humans, the 3 points I've listed above a re a real danger.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
"It's unclear exactly how this bug occurred"... riiiiiiiight, git blame?