McAfee Grabbed Data Without Paying, Says Open Source Vulnerability Database
mask.of.sanity (1228908) writes with this excerpt from The Register: "'Intel security subsidiary McAfee may be in hot water after it allegedly scraped thousands of records from the Open Source Vulnerability Database instead of paying for them. The slurp was said to be conducted using fast scripts that rapidly changed the user agent, and was launched after McAfee formally inquired about purchasing a license to the data.' Law experts say the site's copyright could be breached by individuals merely downloading the information in contravention to the site's policies, and did not require the data to be subsequently disseminated."
"McAfee Grabbed Data Without Paying, Says Open Source Vulnerability Database"
Smash and grab? I bet he is hiding out in Ecuador.
open "sourced", not "open source."
http://osvdb.org/about
I was confused about how someone could be charged for access to "open source" information...
Here's the NPO, with two officers, backing it:
http://opensecurityfoundation....
Please help metamoderate.
Federal prosecutors charged him with two counts of wire fraud and 11 violations of the Computer Fraud and Abuse Act,[12] carrying a cumulative maximum penalty of $1 million in fines, 35 years in prison, asset forfeiture, restitution and supervised release.
I'm no McAfee advocate by any means, but the span of time between the initial sales consultation and the unauthorized scraping indicates that the person involved with the scraping might not have been involved with the sales process and was ignorant of the need for a PO. The clumsy way they scraped without even trying to conceal their user agent indicates incompetence, rather than malice. Of course, McAfee's size and influence holds them to a higher standard that should preclude anyone running rogue like this.
Gamingmuseum.com: Give your 3D accelerator a rest.
McAfee did nothing different than what millions of people do every day via TPB.
I would argue there's a bit of a difference. If true, McAfee is using this illegal data for *profit*, as opposed to just using it for entertainment/personal use. I think a more analogous scenario would be grabbing a movie via TPB and then charging your friends to watch it with you.
Hi, MS programmer here. I caused most of those vulnerabilities, so actually it is MY data.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
They offer the info free for personal use, but expect commercial users to pay to support their efforts. McAfee knew this.
Regardless of the legality, it was ethically wrong.
"National Security is the chief cause of national insecurity." - Celine's First Law
the TPB guys were making a lot of money off TPB
if this makes the crappy antivirus that is bundled on your parents computer a little less crappy, can you really complain?
lose != loose
Actually, in the US, the data belongs to whoever collects it, not who it is about. If the scraped site has a terms and conditions page, McAfee will be sued on that, and that will be compounded due to the fact they were in discussions about buying the data.
FYI if you want to use open source in a closed source / commercial project then often you do have to pay for it, depending on the licence it's open sourced under.
Based on their web site and description, "OSVD" may have started out as an "open source database", but now it seems to have morphed into something that is effectively a commercial data aggregator and vendor hiding behind a non-profit and giving out limited, free samples. In any case, whatever it is, their database clearly is not "open".
This data is not illegal, and it would seem like it's probably not protected by copyright under US law, since it is most likely a collection of data lacking originality. Even if it is copyrightable, i would say it's still unethical to restrict the flow of this data moreso than other data.
This is my signature. There are many like it, but this one is mine.
If you have to pay for it, it sure as hell ain't open source.
Wrong. It is perfectly legal to charge for open source (GPL, BSD, etc).
Open source lets the customer modify, improve and fix the software, instead of being at the mercy of the software author.
The OSVDB went pay a few years ago. They have a wealth of interesting information and use to be fully open source however due to lack of community involvement they decided that the open source model wasn't working for them. If the OSVDB has a problem with people scraping their site, they should really update (or in their case - create) their robots.txt. I was interested in this data myself a year or so ago until I found out they wanted me to pay a subscription to access information I can view for free on their website and screen scrape for free if I really wanted to. Further more, I noticed that google has completely cached their site because they take no preventative measures against it. If anyone wanted this data, they could easily screen scrape it from the google cache and the OSVDB would be none the wiser. Why should anyone pay for data that the OSVDB has literally done nothing to protect?
TPB offers their information (torrent files, last time I looked) freely. I assume you mean the content many/most of those torrents point people to... and yes, pirating things is also unethical. Having said that, I believe that an ethical violation for commercial gain is more egregious.
"National Security is the chief cause of national insecurity." - Celine's First Law
Concidering mcafee has long since made the jump from antivirus to fully blown virus/malware, what were they expecting?
Make a man a fire and he will be warm for a day, set a man on fire and he will be warm for the rest of his life
Not all data is protected by copyright. If someone makes data available on a website that is not protected by copyright, then it's perfectly legal to scrape it. (At least by U.S. law.) The posting of a license on a website makes no difference where there are no copyrights in the material copied. By posting web pages and data in a location available to the public, the website granted an "implied license" to copy the pages and data.
Copyrights attach to "works of authorship". A database can be such a work, but simple data in a database probably isn't. If the scraping engine looked up the unprotected data in the database without copying substantial parts thereof (as seems to be the case from the article), then no copyrights were infringed.
So I'd have to ask the question: what did McAffee scrape, and was it a "work of authorship"? If all they got was the fingerprints, filenames and names of viruses/vulnerabilities, then I'd have to say "no".
This will be one of the times that I shout "hurrah" for McAfee!
It's behind Cloudflare, and they're leveraging other means to catch scraping. This hardly seems like "wide open"
paul reinheimer
It's not real like a car, it's digital. Everyone should have access to it for free.
McAfee did nothing different than what millions of people do every day via TPB.
The difference is while TPB may be dicks they are fighting even bigger dicks MPAA
mcafee is a dick but are screwing over non-dicks
---Saying gnome 3 is better than windows 8 not so much a compliment as it is damning with light praise.
I've been using linux since 1998. I don't need a lecture on open source licensing.
Charging for access to data is fundamentally incompatible with claiming it's "open source" by many people's definitions.
Please help metamoderate.
Any original (non-plagiarized) content can be copyrighted. Further, if the site has an account signup license that states that "vulnerability report submitter assigns his/her posts' copyright to website so that it can modify, reproduce that post as it sees fit," then yes, you cannot mass copy the database freely without violating copyright laws.
Open Sourced has a different meaning in the context they use it, they are talking about how they get their data from many sources including volunteers.
http://osvdb.org/osvdb_license
Isn't this what Aaron Swartz did? Is the US Government going to "make an example" of McAfee too?
McAfee left the company over twenty years ago
Any original (non-plagiarized) content is copyrighted by default. Further, if the site has an account signup license that states that "vulnerability report submitter assigns his/her posts' copyright to website so that it can modify, reproduce that post as it sees fit," then yes, you cannot mass copy the database freely without violating copyright laws.
FTFY
Doesn't matter if the data is free or not - if you're circumventing access restrictions, it's effectively breaking in (not like most of us haven't done it, but still).
Ethical simply means following a consistent ethic (rule). So "I steal everything I can, and some I can't" is immoral, but ethical as long as that is the rule you consistently follow.
Which is why I hate the use of the word "ethical" in our society. It's a lie.
Bill Clinton was our most ethical president ever.
And if anyone didn't know ahead of time what was going to happen to whistleblowers with "the most transparent administration ever", they didn't understand the meaning of "transparent".
Hint: I absolutely despise modern language.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
The default copyright goes to the author no the website, unless author assigns it to the website. Hosting a comment on your website does not mean you own it, at least that's what I think. You have to get express permission from the original copyright holders, the authors, to legally obtain copyright.
Then why aren't the developers of Linux kernel getting paid?
I think the question you're looking for is "Why are only 83.1% of the developers of the Linux kernel getting paid?'
Wait, wha.. OH! For a second I thought this was another zany article about John.
THIS SPACE INTENTIONALLY LEFT BLANK.
That statistic is only after march 2012, when the kernel was more or less stable. What about 20 years worth of work before that? I don't think most of those developers have been paid. Also, making little changes to a stable product is easier that creating it from scratch.
There is no copyright in facts, which is why the Register article says there is a "debate" about copyright protection in databases. If a database is nothing more than a collection of facts, it won't be eligible for copyright protection. (It might be eligible for a database protection right in Europe, though)
That said, databases can be copyrighted if they contain original creative content, or if the selection and arrangement of the facts is original and creative. The article hints at a sweat of the brow justification, which would not work - just because you spend a lot of time compiling facts doesn't mean you get copyright in them (well, at least not in the U.S.). But the threshold for originality and creativity is pretty low, so if OSVDB does any editing or categorization or summarizing of reports, that might be enough to get them copyright in the database.
From a purely legal perspective, Swartz's intentions would probably be considered "worse." He mass-downloaded a bunch of articles from JSTOR (and no, I doubt all of them or even most of them were funded with public money), although he arguably had the right to do so. From what I understand, his intention was to release the articles to the public, but he never got that far. Had he done so, that would certainly have been a massive copyright violation, and there would have been multiple suits from multiple publishers (meanwhile, I'd imagine most of the authors of the articles wouldn't care, since they rarely if ever receive royalties for those articles, and often have to pay fees to have them published).
Whereas McAfee scrapes data from a publicly-accessible database that may or may not be protected by copyright. OSVDB will first have to prove they have a valid copyright in order to claim infringement. Maybe they'll fall back on this argument that even if not copyrighted, the data was licensed, but it's hard to throw up uncopyrighted data on a public web page and claim that there is some kind of binding license on everyone who accesses it. When uncopyrightable databases are licensed, that will usually involve signing a contract.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
Exactly. It's protected by copyright. Whether the copyright holders have granted the public permission to copy their content and use it for commercial gain is another issue (that is going before the courts).
OSVDB is notorious for scraping NVD (NIST National Vulnerability Database) and both follow CVE and CCE standards that are maintained by Mitre. Both OSVDB and NVD are public vulnerability databases maintained by outside submissions. NVD/OSVDB do not conduct any kind of vulnerability discovery activity.
I don't see how OSVDB can claim any rights to this data. They certainly didn't produce it. Thankfully, if they stupid enough to claim it NIST will quickly put them in their place.
At least in North America facts (which is what SV data is) are not considered to be copyrightable. (In Europe I believe there is some protection for databases) This might be a ToS violation but I think most Slashdot'ers would agree those are questionable and that public websites should not have different protection from the phonebook delivered to your door. (Which Yellowpages has previously complained about Google and others "copying")
As someone who looks at SV data regularly and has previously pointed things out to OSVDB maintainers, I would also point out that the majority of the OSVDB database is simply a clone of CVE, thus in reality isn't even "theirs".
> From what I understand, his intention was to release the articles to the public, but he never got that far.
As far as I know, there is no evidence for this, except circumstantial (feel free to reply with supporting evidence). You could very well be correct, or he could have had a more nuanced plan, like only releasing the public domain stuff first, or threatening to do so, and somehow hoping to leverage that to achieve other goals (like, for example, the subsequent JSTOR relaxed access policy which enables private individuals to access 3 papers for free every two weeks), but now we will never know.
How is Swartz worse? He may have intended to commit massive copyright violations, but he DID not. And he had rights to this information per JSTORs own terms of service. He was going to be prosecuted for 50 years to life for a thought crime. If thought crime is worse than actual crime, that is a big problem.
OSVDB says there is a debate about whether this information is copyrightable, but they aren't pursuing that angle.
If McAfee workers read these documents to improve software that they are developing, then that's a commercial use and it violates the terms under which the information was provided.
The site was heavily ad / adware littered. What do you think all the porno adds and what not were for if not monetising the site?
I disagree, and I've never heard anyone give that as a definition of ethics. Often, "ethics" and "morals" are used interchangeably. But I believe that in common usage ethics implies following "the golden rule," whereas morality is based on a more personal (perhaps religious) belief. For example, some might believe sex outside of marriage to be immoral, but it would be rare to find someone claiming it to be unethical.
"National Security is the chief cause of national insecurity." - Celine's First Law
This brings up an interesting conundrum about copyright... So, if I scrape TRW (Sorry, Experian)'s website and it's only to download information about MYSELF, who's got the copyright on that? Experian is supposed to provide the information for free to me anyhow, on request, so, can I be charged with a crime for taking it without asking?
And lets talk about all the other thousands of companies (Facebook, Google, United Healthcare, BlueCross, Amazon, Slashdot, yadda yadda yadda) that collect and resell information about me. Who owns that information about me? And isn't it sad that I can't get to all that information about me, in fact, I seem to spend most of my time now making sure that what information about me out there is wildly inaccurate, and if it's something I made up in a web form, then it should be copyright ME, no???
If telephones are outlawed, then only outlaws will have telephones.
APK once again misses
The obvious--that is,
The barn-sized difference
Between libre and gratis
BURMA SHAVE
cat
Uh...JFGI? There are a ton of articles on the advertising profits made by the likes of TPB.
Here is a more recent one
I remember reading an interview with the guys a few years ago, and apparently each of the prime flash slots along the sides of the site run at $20k per month.
The first link in the article is for The Linux Foundation, who have been publishing the same report since at least 2008, when a minimum of 70% of the contributors (including people who submitted one-line fixes) had corporate sponsorship. Even before then it is easy to see who the top contributors to Linux were -- Kernel maintainer Alan Cox was employed by Red Hat from 1999 to 2009. Ted Ts'o worked with MIT, VA Linux and IBM while he developed /dev/random and the ext2 file system. John "Mad Dog" Hall was the man responsible for making Alpha the second architecture Linux ran on while he worked with Digital. Prior to his employment with Transmeta and the Linux Foundation, Linus Torvalds was paid $20,000,000 in stock options by Red Hat and VA Linux.
Even before the majority of kernel development was done with corporate sponsorship, it was done to further academic goals. While not every one of these people is a dot com millionaire for their work with Linux, calling it a product of slave labour is disingenuous at best.
The copyright of a cookbook is in the curation, the choices inclusion, exclusion, and order of recipes. The white pages of a phone book are not copyrightable because they lack originality in those areas. I suspect that a vulnerability database is more like a phonebook than a cookbook in that particular regard.
This is my signature. There are many like it, but this one is mine.
You are correct that any original content can be copyrighted, but are incorrect about the meaning of 'original.' I have doubts that this database could stand up in court due to the precedent set by Feist v. Rural.
This is my signature. There are many like it, but this one is mine.
Yeah, I also read something suggesting he wanted to do some text mining on the articles to find bias in corporate funded research. I think it was the prosecution pushing the idea that he wanted to release the articles, based on quotes from the Guerilla Open Access Manifesto, etc.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
Well, he was going to be prosecuted primarily for violations of the CFAA, not copyright infringement.
Anyway the point I was trying to make is that I'm not convinced that OSVDB has any exclusive right to the information, period. If they don't have any exclusive right to it, then can try and "license" it all they want, but it doesn't matter. You don't get to just throw up a bunch of factual, non-copyrighted (and non-copyrightable) information on a public web page, then claim that anyone who doesn't comply with your "license" is doing something illegal... because they're facts. If you want to play that game, you'd better get your audience to sign a contract. There's no trade secrecy here, either, because the information is public.
Maybe OSVDB has some claim for unfair competition under state misappropriation laws, similar to the "hot news" doctrine. But their case would be much more convincing if they had a copyright claim, which even they don't seem convinced about.
Actually, given the way the CFAA is written (and abused), maybe that would cover the situation.
Of course McAfee is probably being a bad citizen here - I assume the point of the license, whether enforceable or not, is to try to defray the costs of establishing and maintaining the database. But simply being a bad citizen isn't necessarily illegal.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
But is this an original work, in the US copyright law sense? Mere compilations of facts are not. (Also, I don't know if such a copyright assignment would work, legally; the usual practice is that a submission implicitly carries a license with some rights.)
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Yeah, I see what you mean. CFAA is overly broad. Any "scary stuff with computer".