Over 100,000 GitHub Repos Have Leaked API or Cryptographic Keys (zdnet.com)

← Back to Stories (view on slashdot.org)

Over 100,000 GitHub Repos Have Leaked API or Cryptographic Keys (zdnet.com)

Posted by msmash on Friday March 22, 2019 @02:42AM from the security-woes dept.

A scan of billions of files from 13 percent of all GitHub public repositories over a period of six months has revealed that over 100,000 repos have leaked API tokens and cryptographic keys, with thousands of new repositories leaking new secrets on a daily basis. From a report: The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study's results have been shared with GitHub, which acted on the findings to accelerate its work on a new security feature called Token Scanning, currently in beta. The NCSU study is the most comprehensive and in-depth GitHub scan to date and exceeds any previous research of its kind. NCSU academics scanned GitHub accounts for a period of nearly six months, between October 31, 2017, and April 20, 2018, and looked for text strings formatted like API tokens and cryptographic keys.

52 comments

Min score:

Reason:

Sort:

A "scan and ban" function? by ctilsie242 · 2019-03-22 02:49 · Score: 1

I wonder if GitHub could offer a service where if an API key, be it PGP, SSH, or others, it would automatically disable that item on the relevant repository. This wouldn't stop the best of the best, but it would at least be some remedial security... far better than none.
1. Re:A "scan and ban" function? by tepples · 2019-03-22 02:51 · Score: 3, Insightful
  
  I'm interested in the algorithm that you propose that GitHub use to determine whether a 32-character alphanumeric string embedded in the source code is an API key or something else.
2. Re:A "scan and ban" function? by Narcocide · 2019-03-22 03:13 · Score: 1
  
  Well it probably wouldn't be a bad idea for it to least issue a warning when you check in ~/.ssh by accident. A lot of these private files should be easily identifiable by path and name if they are not being put there on purpose.
3. Re: A "scan and ban" function? by Anonymous Coward · 2019-03-22 03:53 · Score: 0
  
  Yes, that guy is a moron
4. Re:A "scan and ban" function? by Anonymous Coward · 2019-03-22 05:21 · Score: 0
  
  I'm interested in the algorithm that you propose that GitHub use to determine whether a 32-character alphanumeric string embedded in the source code is an API key or something else.
  The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study's results have been shared with GitHub
  How about THAT algorithm, JFC!
5. Re:A "scan and ban" function? by Anonymous Coward · 2019-03-22 05:40 · Score: 0
  
  No it's impossible, teppledorf, that’s why you’re not literally looking at an article about a remote scan for API keys right now, and it doesn't happen all the time leading to compromised accounts. Totally impossible to say this is an alphanumeric string with high entropy, UNPOSSIBLE!
  It would be absolutely _trivial_ to scan for common patterns and offer a service that blocks items from being downloaded until someone acknowledges an alert or adds something to a whitelist.
  If you were going to sell a service like github this would be a no-brainer for a premium feature.
6. Re:A "scan and ban" function? by weilawei · 2019-03-23 05:19 · Score: 1
  
  For starters, you might try compressing stuff and see what's incompressible, and therefore likely random binary data. You can do some sifting for false positives (images by magic number, etc.), and the rest should be a reasonable pile of cryptographic data.
How to distribute client credentials to end users? by tepples · 2019-03-22 02:50 · Score: 2

Say a desktop or mobile application distributed as free software in source code form acts as a client for some Internet service. How is the application's developer supposed to distribute the required API key to the user's machine without exposing it in the source code? Or is each user of the application supposed to apply for API keys for his or her own copy of the application?
(See also my previous thoughts on the API key matter)
Oh dear by newbie_fantod · 2019-03-22 02:53 · Score: 2

Gee, if only there was some quick and easy way to migrate from GitHub to Sourceforge... Oh wait!
1. Re: Oh dear by Anonymous Coward · 2019-03-22 02:56 · Score: 0
  
  It doesn't take long if you're not some paranoid encryption developer who blocks every port the repositories use. Oh no! If I enable this feature someone might read my free trial API key somehow!
2. Re:Oh dear by chrism238 · 2019-03-22 05:18 · Score: 1
  
  You've been (effectively) spammed!
3. Re:Oh dear by radarskiy · 2019-03-22 06:07 · Score: 1
  
  Do you actually think leaking keys on Sourceforge better than leaking keys on GitHub, or have I missed a joke somewhere?
4. Re:Oh dear by newbie_fantod · 2019-03-22 10:15 · Score: 1
  
  Bad joke on my part. The first thing I always see on Slashdot is a banner advertising "Migrate from GitHub to SourceForge quickly and easily with this tool..", apparently not everybody here is so targeted.
Re: How to distribute client credentials to end us by Anonymous Coward · 2019-03-22 03:10 · Score: 0

API key is by design a form of authentication so distributing it to everyone with source code or even binaries makes no sense at all.
API key should be requested by the client after other form of authentication or entered manually on the client after obtaining it in secure way from API provider.
I suspect many are just fake by Anonymous Coward · 2019-03-22 03:11 · Score: 0

As the author replaces real keys with keyboard mashed keys
1. Re:I suspect many are just fake by quintus_horatius · 2019-03-22 03:50 · Score: 3, Funny
  
  NCSU academics scanned GitHub accounts for a period of nearly six months... and looked for text strings formatted like API tokens and cryptographic keys.
  Or maybe they were just misidentifying old Perl scripts
Either that or by phantomfive · 2019-03-22 03:11 · Score: 1

It could be the researchers made a mistake in their regular expression that is picking up something that looks like keys but aren't keys.

If this does happen to you it's because you aren't doing code review. If you are solo, then give yourself a quick review by doing a "git add" on each individual file before committing. That gives you a chance to double-check, and you can even do a "git diff" on each file before committing to be extra sure. There are lots of processes you can use to avoid this kind of mistake.

--
"First they came for the slanderers and i said nothing."
1. Re: Either that or by Anonymous Coward · 2019-03-22 03:14 · Score: 0
  
  Review? As in PR? Too late .. Or did you mean extreme programming pre-push review? No one does that
2. Re: Either that or by phantomfive · 2019-03-22 03:26 · Score: 1
  
  Review? As in PR? Too late .. Or did you mean extreme programming pre-push review?
  
  To commit to git, you have two options: you can either commit the files one at a time, or commit them all at once without looking at what you are committing. One of those options leads to mistakes, even if the mistakes aren't as serious as committing a private key.
  
  No one does that
  You don't need to, but if you find yourself committing things you shouldn't, then make a chanage in your personal process so that doesn't happen.
  
  --
  "First they came for the slanderers and i said nothing."
3. Re: Either that or by Jaime2 · 2019-03-22 03:55 · Score: 1
  
  I've been managing developer for many years. You'd probably be amazed how many developers simply blindly commit and pray that someone else set all of the ignore rules up correctly. I have this conversation at least once per month: "Why did you commit X"... "I didn't do that, [git/hg/svn/cvs/tfs] did that." I occasionally get "Why did you deleted Bob's code"... "I didn't, there were no merge conflicts".
  Or the worst version of this... "Ohhh, you mean I shouldn't have checked code in that will overwrite the production database next time one of the automated tests runs?"
4. Re: Either that or by phantomfive · 2019-03-22 04:07 · Score: 1
  
  I occasionally get "Why did you deleted Bob's code"... "I didn't, there were no merge conflicts".
  lol "I did 'git push -f' and it worked fine!
  
  --
  "First they came for the slanderers and i said nothing."
5. Re: Either that or by Anonymous Coward · 2019-03-22 06:24 · Score: 0
  
  Committing file by file would lead to history mess making the history of the repo completely useless, unless you squash but that makes history useless/less useful too
  Not that it prevents keys from being committed.
  Any such solution must be automated
  And your operations must imho allow for quick key rotation should that happen anyway
6. Re: Either that or by Anonymous Coward · 2019-03-22 06:30 · Score: 0
  
  File by file - Not to mention the horrendous tedious process that no one would follow for very long and would slow things down instead of speeding dev process
  How big are your commits? Lets say you are agilr and story takes you two days? New feature across several services, layers plus tests etc....
  Jaysus
7. Re: Either that or by phantomfive · 2019-03-22 06:37 · Score: 1
  
  You add files one by one, using the "git add" command, then you commit them all at once using the "git commit" command. I don't think you know how git works.
  
  --
  "First they came for the slanderers and i said nothing."
but by alessi_brand · 2019-03-22 03:17 · Score: 3, Insightful

How do they differentiate bogus keys from real keys? In my projects I deliberately include keys that are valid, but won't get you into anything but 'local' applications running with no sensitive data. There are plenty of valid reasons (integration tests, clone-and-run dev applications, etc) to have 'valid' but practically useless keys in github.
1. Re:but by fph+il+quozientatore · 2019-03-22 04:36 · Score: 1
  
  Those credentials are identified by the online service they work for (Google API keys, Amazon AWS, Facebook tokens...), so in theory they could just try them and see if they log you on. It looks like they did that, at least in part, because they determined that "the vast majority" of the .openvpn access keys they found used key-only authentication and were not paired with a second factor like a password.
  
  --
  My first program:
  Hell Segmentation fault
2. Re:but by Anonymous Coward · 2019-03-22 05:56 · Score: 0
  
  How do they differentiate bogus keys from real keys? In my projects I deliberately include keys that are valid, but won't get you into anything but 'local' applications running with no sensitive data. There are plenty of valid reasons (integration tests, clone-and-run dev applications, etc) to have 'valid' but practically useless keys in github.
  There are a million and one ways to easily store K/V pairs today and here we are talking about how can we scan for passwords in source control if we like checking passwords into source control. Unbelievable. So how’s this work, dev/qa can clone and run, production is a big search and replace fest?
3. Re:but by Anonymous Coward · 2019-03-22 06:33 · Score: 0
  
  Yeah, it does seem pretty silly to put too much faith in this report. I know of several repositories that contain "text strings formatted like API tokens and cryptographic keys", where they are intentionally included, either as an invalid token to demonstrate what the format of a valid token would be or, in the case of cryptographic keys, for use in unit tests for testing encryption of dummy data.
4. Re:but by imidan · 2019-03-22 06:47 · Score: 1
  
  Yeah, I have several GitHub projects where I've left passwords in the code. The passwords work on a local instance of some API that's exposed on a port that isn't open outside the machine it's on, has NAT without port-forwarding between it and the Internet, and is only running when I turn it on. The passwords, themselves, are randomly generated and not reused on other services, so they don't leak any particular information about my passwords elsewhere. When I put the code into production, I use a different instance of the API and so create new passwords. I don't see any vulnerability here that would be exploitable without tremendous effort and luck (plus a strong motivation to break into a service of extremely low value), so I don't worry about it.
  So I wonder if these passwords would get caught up in such an analysis. I know I've had people come up to me at conferences and worriedly tell me that I've left a password in my code. I reassure them that it's okay and the passwords are fairly inert.
why is it by FudRucker · 2019-03-22 03:19 · Score: 1

that every time microsoft gets ownership of something a few weeks or months later some bad shit like this happens, makes me wonder if a lot of this sort of thing is an inside job,

--
Politics is Treachery, Religion is Brainwashing
1. Re:why is it by OzPeter · 2019-03-22 03:28 · Score: 3, Informative
  
  that every time microsoft gets ownership of something a few weeks or months later some bad shit like this happens, makes me wonder if a lot of this sort of thing is an inside job,
  MS didn't force people to upload keys to 100,000 repositories. This is not a MS thing and implying it is is pure flamebait.
  
  --
  I am Slashdot. Are you Slashdot as well?
No Biggie by Anonymous Coward · 2019-03-22 03:26 · Score: 0

Thanks to webscale your keys will be lost in the crowd, so you can continue with your dog-shit practices, secure in the knowledge that even if they find yours, they probably won't have time to use them.
Keep on developing your Rube Goldberg machines with Twitter, Facebook, IFTTT, Zappier, Slack, Alexa skills...
Fuck security. Ain't nobody got time for that!
Bit confusing summery by houghi · 2019-03-22 03:37 · Score: 1

From what I gather from the article is that people put the key in their code. It is not that Github did anything unsafe.
I assume what it could do is block sites that put the key in their code somehow. Obviously if I post the key somewhere else (e.g. here) there would not be a lot that could be done. It also means that code from Github can not be called safe to use, as you never know if somebody has not added malicious code to something that was safe before.
If I am somewhere mistaken, please do not hesitate to correct me.

--
Don't fight for your country, if your country does not fight for you.
1. Re:Bit confusing summery by Jaime2 · 2019-03-22 03:59 · Score: 1
  
  The summary and the article are clearly calling out naive GitHub users, not GitHub itself. They chose to dredge GitHub because it's popular, not because they suspected any wrongdoing on Github's behalf.
Were they real keys? by jeff4747 · 2019-03-22 03:38 · Score: 1

So, it looks like they more-or-less did a regex for things that looked like keys.
How did they know they were "real" keys? If I check in some integration tests, they're going to need a key.....and no one should use that key in anything other than a local integration test. Nor would they expect to since it's in "test" folder only used to build and run tests.
Or you might check in a key to provide an "example" mode with all sorts of warnings about "change this key before production", similar to how many web services will out-of-the-box use http instead of https. Not ideal, but not necessarily an issue.
Re: How to distribute client credentials to end us by tepples · 2019-03-22 03:42 · Score: 1

The problem then comes when a service requires that users be 13 to use the service but 18 to register as a developer in order to obtain an API key. It means 13 to 17 year olds are required to either use proprietary software or not use the service.
Re:Over 100000 ACs Have Gotten Frist Posts Since 1 by Anonymous Coward · 2019-03-22 03:44 · Score: 0

Whipslash and editor crew: We know you do that for burying posts to game the tree structure of replies here, nothing more. Do you think we don't see and know that?
Source Code Or Configuration Files by deKernel · 2019-03-22 03:46 · Score: 1

I guess my question is that are these keys in question in source modules or just configuration files. If they are in configuration files, how do they not know these are just test keys that will then get changed to production values.
Re:How to distribute client credentials to end use by jeff4747 · 2019-03-22 03:47 · Score: 1

If you're talking about something like ssh, you distribute the public half of the key and not the private half. If you're talking about something like https, you get a cert from one of the official places, and don't distribute it at all (you could make your own cert and distribute the public half, but it's more painful). If you're using a key for user authentication, each user is going to need to generate their own key and you aren't distributing anything.
There are valid reasons to check in a private key (integration tests, "dev mode"), and it can be made clear that those keys are not for production use (documentation, it's in the "test" folder and only used by the testing engine, only listen on localhost if it's the checked-in key, etc)
Filler data by Anonymous Coward · 2019-03-22 04:01 · Score: 0

How many of these are actually filler?
Do you care if your public GitHub repository is full of fake SSH private keys, 'pass1234', 'changeme' or 'replace-this-fake-value-in-your-code'? No, you do not.
The article mentions that the basis for guessing that a credential is likely valid is that it is for a single user account instead of a multiple user account.
That is baseless. Attempt to use the credentials. Scan for obvious fakes. There is no reasonable basis to classify these passwords and keys except testing.
If you work with Java then you know that 'changeme' is the default (and often mandatory) password on a Java PKCS12 keystore. Some applications cannot even deal with a different credential - it's hard coded into the software.
But it is sad that almost one in ten of the developers put out something, now permanently in the archives and exports, then tried to redact it. You'd hope that software developers smart enough to use git would know that the more embarrassing it is, the longer it stays on the Internet.
Re: How to distribute client credentials to end u by Anonymous Coward · 2019-03-22 04:04 · Score: 0

What kind of weirdo edge case are you making up? First you have no clue about basic creeential management, then you go off on some weird tangent about teens? Are you a pedophile?
Re: How to distribute client credentials to end u by tepples · 2019-03-22 05:51 · Score: 1

What kind of weirdo edge case are you making up?
An edge case that has occurred in my own circle of friends. I have relatives who joined Twitter before age 18.

Are you a pedophile?
No.
Re:How to distribute client credentials to end use by tepples · 2019-03-22 05:53 · Score: 1

If you're using a key for user authentication, each user is going to need to generate their own key and you aren't distributing anything.
I'm talking about OAuth, version 1 or 2. The client ID and client secret in OAuth authenticate an application to a service so that the application can receive a session ID representing the user.
Re: How to distribute client credentials to end us by tepples · 2019-03-22 05:56 · Score: 1

Say a million users install an application. Do you think Twitter would appreciate a million requests to register a nearly identical application, differing only by internal timestamps and compiler optimization flags?
Re:How to distribute client credentials to end use by jeff4747 · 2019-03-22 07:03 · Score: 1

And those are going to need to be generated per-app. Otherwise you aren't authenticating anything.
Re:How to distribute client credentials to end use by tepples · 2019-03-22 08:59 · Score: 1

But what's an "app"? Is it the executable program built from a particular repository, or a particular installation thereof?
Found the patterns by tepples · 2019-03-22 09:13 · Score: 1

The featured article is light on details on the patterns used to determine whether a string is "in the format of particular API tokens or cryptographic keys." GitHub's page about "token scanning" likewise doesn't say much. A link deeper in the article to "git secrets" by Amazon gives regular expressions for Amazon API credentials but not those of other well-known providers. The actual regular expressions used are buried in Table III of a PDF linked near the end of the article.
Fortunately, ZDNet is not paywalled.
Re:How to distribute client credentials to end use by jeff4747 · 2019-03-22 09:40 · Score: 1

Take a second to think about this.
If every single install of a program, anywhere on the planet, uses exactly the same identity, how do you know who to let in and who to keep out?
You wouldn't. Which is why you don't give everyone the same identity just because they're running the same executable.
How are API keys actually secure at all? by Miamicanes · 2019-03-22 12:51 · Score: 1

Suppose I write an Android app that uses Google Maps & requires an API key. I dutifully follow Google's instructions, build my app with the key in Strings.xml, compile it, sign it, and publish it to Google Play.
What, exactly, is there to stop someone from obtaining my app through Google Play, ripping it from their phone, deodexing the binary, extracting my API key, then writing THEIR OWN Maps-using app that uses my API key and distributing it to a million users in China (or anywhere else in the world) so the API calls of ITS users end up getting charged against MY account?
I mean, I certainly hope that Google at least does binary-analysis and would reject an app uploaded to Play that attempted to re-use MY API key... but that still does nothing to address API key security in apps NOT distributed via Google Play.
Likewise, Google could theoretically do some form of sanity-checking, and pass along BOTH my API key AND an ID tied to that specific user via Google Play Services, and reject the key's use if the user had never actually installed my app... but that wouldn't help with something like Amazon API keys.
I could see it being secure if API keys offloaded responsibility for registration and payment directly to end users, so that apps would be distributed with dummy keys that were replaced by "real" ones after end users directly authenticated THEMSELVES to Google Maps, Amazon AWS, etc., and used MY app with THEIR key at THEIR OWN expense... but I just don't see how the current way API keys are used with things like Android apps is in any way even REMOTELY secure against key-hijacking.
This is a MAJOR reason why I've always been afraid to publish any Android app that depends upon having an AWS key. At least Google will allow you to distribute a Maps-API-using app with a key that simply gets disabled if you go over the free limit. Amazon won't. They'll give you a certain free allowance, and it might even be fairly high... but Amazon WILL NOT give you the courtesy of setting any kind of absolute circuit breaker and tell them, "if my app exceeds {free-allowance | some-absolute-limit}, just kill it dead and stop any additional charges from accruing". Even worse, Amazon's billing isn't realtime... you can set up all the safeguards you want to notify you if your usage exceeds some amount and kill your account, and even to try and kill it via scripted means... then see the amount you own CONTINUE to skyrocket AFTER you've pulled the plug due to use that happened before you suspended the service, but AFTER the most recent balance update.
Frankly, Amazon's AWS billing policies (above and beyond any concerns about the security of distributing an app with my API key compiled into it) scare me shitless, and apparently I'm far from being the only one who feels that way. I, for one, will NEVER willingly give someone a literal blank check to bill me into bankruptcy.
1. Re:How are API keys actually secure at all? by Anonymous Coward · 2019-03-22 15:43 · Score: 0
  
  API keys for third party services shouldn't be bundled client side, but kept on your server. After the user authenticates with your server, it acts as a proxy to the third party service. Granted this isn't always feasible when integrating a client side library like Google Maps on Android.
API's were ruled public information by Anonymous Coward · 2019-03-24 21:11 · Score: 0

Hi folks,
Remember the landmark hearing between Google and Oracle?
API's are supposed to be public information for any service that wishes to be accessible. Welcome to the internet.
So, M$ takes over @github and now they start sponsored research to make API publication and keysharing via git repo's look bad? Or is my take too cynical? Their reputation is of the darkest colour, in my three decades experience of the direction of company policies and competitive practices.
And some string magic to google up cryptographic keys? In many cases these are uploaded and shared development key files or included key code to enable services via public key encryption.