Open Source Data Sets? Linux Foundation Introduces 'Community Data License Agreements' (linuxinsider.com)

← Back to Stories (view on slashdot.org)

Open Source Data Sets? Linux Foundation Introduces 'Community Data License Agreements' (linuxinsider.com)

Posted by EditorDavid on Saturday October 28, 2017 @08:59AM from the come-along-and-share-the-data dept.

"In open source philosophy, you share source code. Why not share data?" writes Slashdot reader princelobga. Linux Insider reports on the Linux Foundation's new Community Data License Agreement, "a new framework for sharing large sets of data required for research, collaborative learning and other purposes." CDLAs will allow both individuals and groups to share data sets in the same way they share open source software code, the foundation said. "As systems require data to learn and evolve, no one organization can build, maintain and source all data required," noted Mike Dolan, VP of strategic programs at The Linux Foundation. "Data communities are forming around artificial intelligence and machine learning use cases, autonomous systems, and connected civil infrastructure," he told LinuxInsider. "The CDLA license agreements enable sharing data openly, embodying best practices learned over decades of sharing source code."
A principal analyst at Pund-IT told the site that the new data license "reflects the growing importance of information as a resource for big data analytics, machine learning and artificial intelligence."

31 comments

Min score:

Reason:

Sort:

Open sores all over by Anonymous Coward · 2017-10-28 09:28 · Score: 0

systemd is open sores
Free Blowjobs in Seattle by Anonymous Coward · 2017-10-28 09:32 · Score: 0

Call Craig @ 425-481-0289.
1. Re:Free Blowjobs in Seattle by Anonymous Coward · 2017-10-28 11:19 · Score: 0
  
  I'm Devin Branson. A friend told me you posted my phone number here. Why?
Binary data? by Anonymous Coward · 2017-10-28 09:32 · Score: 0

Don't put binary data to the source repository.
1. Re:Binary data? by UnknownSoldier · 2017-10-28 17:31 · Score: 1
  
  Obviously you've _never_ worked on any games. Art, Music, Levels, Previous Executables, etc. are all valid things to keep in a repo.
  Stop using shitty source repositories that don't know how to handle binary blobs.
  Scientists should have been doing this all along so they can get independent confirmation.
2. Re:Binary data? by Anonymous Coward · 2017-10-28 18:42 · Score: 0
  
  Who said anything about binary data?
3. Re:Binary data? by Anonymous Coward · 2017-10-28 23:18 · Score: 0
  
  Plain texts maybe easily editable, readable, etc.
  Binary data maybe hardly editable, readable, etc., and worse as politically incorrect content.
4. Re:Binary data? by JustAnotherOldGuy · 2017-10-29 04:24 · Score: 1
  
  Don't put binary data to the source repository.
  Utter bullshit. It's perfectly fine to put binary data in a source repository. Binary data, code tests, unit tests, images, codecs, music/sounds, validation playbooks- all that stuff BELONGS in a source code repository.
  Frankly, if you're doing development at any non-trivial level, you would be an idiot not to store all that stuff in a repository.
  "Our building burned down and we lost all of the art and music and 3-D models we painstakingly made, but we still have the (now useless) source code!"
  
  --
  Just cruising through this digital world at 33 1/3 rpm...
5. Re:Binary data? by Anonymous Coward · 2017-10-29 15:53 · Score: 0
  
  Plain texts maybe easily editable, readable, etc.
  Binary data maybe hardly editable, readable, etc., and worse as politically incorrect content.
  So you're advocating for storing things like game art, video files, audio files, images, etc in plain text?
I would like to explore ways to monetize by Anonymous Coward · 2017-10-28 09:35 · Score: 0

Can this be data mined for target advertising or perhaps some type of political value add? I would like to explore this further.
Open Data? by R3d+M3rcury · 2017-10-28 09:38 · Score: 2

How/Why is this different from the Open Data license?
1. Re:Open Data? by Hognoxious · 2017-10-28 10:15 · Score: 1
  
  It's not strategic. Probably;y not webscale either.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
2. Re:Open Data? by Anonymous Coward · 2017-10-28 11:24 · Score: 0
  
  It's a bit silly since data cannot be copyrighted. You can't restrict what people do with it, either copy-left or copy-right.
3. Re:Open Data? by Anonymous Coward · 2017-10-28 17:05 · Score: 0
  
  Probly the new one is stored in MongoDB instead of MySQL
4. Re:Open Data? by imidan · 2017-10-28 18:48 · Score: 1
  
  From their FAQ:
  
  The CDLA is also not an attempt to fix issues with other licenses – in fact we didn’t start with any other license as the model or base but rather went through a requirements gathering process to understand the use cases under which people were struggling with sharing data
  This is a really strange statement to me. They apparently didn't attempt to address any problem with any existing license, but went ahead and rolled their own license without any consideration for the development that others have gone through for open data license agreements. Troubling, because the groups (particularly Creative Commons) that have put a lot of work into open data licenses have been through the growing pains already, and it seems foolish to ignore that.
5. Re:Open Data? by Anonymous Coward · 2017-10-29 08:03 · Score: 0
  
  The Linux Foundation is a industry trade organization of corporations involved in the enterprise use and development of Linux. Whereas CC licenses are focused on the accessibility to creative works for the benefit of the "commons" or society as a whole, the Linux Foundation is focused on business friendly policies primarily.
87% of the US Population can uniquely identified.. by Anonymous Coward · 2017-10-28 11:05 · Score: 0

by DOB, ZIP, and gender, so we need to be really careful with any "open" data.
Buzzwords by Anonymous Coward · 2017-10-28 11:09 · Score: 0

Open-this, Open-that, open disruption, open game-changer, open paradigm shift. It's time to redraw the buzzword bingo card.
1. Re: Buzzwords by Anonymous Coward · 2017-10-28 14:58 · Score: 0
  
  Open legs and Iâ(TM)ll leave the mic on the floor.
Re:87% of the US Population can uniquely identifie by Anonymous Coward · 2017-10-28 11:29 · Score: 0

The AOL data releases proved most attempts to anonymize data don't work. My old boss was able to figure-out my old AOL account and find the posts I made as a teenager. I got fired for that. I live in Seattle, and this is a very anti-free speech place.
Yeah no copyright on data in the US by raymorris · 2017-10-28 11:51 · Score: 1

That's absolutely right, in the US, at least, there is no copyright on a collection of facts. I don't know if any other countries might allow it on a specific compilation of data. Obviously copyright on a single, discreet fact wouldn't make any sense.
In the US, a copyright could apply to a creative arrangement and formatting of facts. (Much as there is no copyright on musical notes, but there can be on a specific, creative arrangement of specific notes, a song).
So under copyright, you can take someone's dataset and distribute it without asking, but in some cases you can't just redistribute their data FILE. You'd need to produce your own arrangement and formatting of the data, if their work is creative.
There are, however, potentially other considerations other than copyright - trade secrets, unfair competition, etc. Granting permission under the license would estop the producer from filing suit under these other theories, providing users (and those who redistribute) some assurance that they can do so safely.
It also explicitly disclaims any right under copyright, that *probably* doesn't apply anyway in *most* cases. "We definitely have explicit written permission" is better than "we probably don't need permission".
1. Re:Yeah no copyright on data in the US by St.Creed · 2017-10-28 11:57 · Score: 3, Informative
  
  At least in the EU there is a special addition to copyright, which is specifically aimed at databases. They are considered creative works once effort has been put into compiling them. So for at least the EU, this license makes a great deal of sense.
  The most pressing problem is of course, that most open data has little documentation and is hardly any use. But hey, at least we got the legal issues fixed once a license can be slapped onto it. Who needs to actually work with it, anyway?
  
  --
  Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
open but reliable? by Anonymous Coward · 2017-10-28 13:37 · Score: 0

And who will validate the accuracy of the data? Or does this open the door for more "fake" data?
Brilliant! by Anonymous Coward · 2017-10-28 13:37 · Score: 0

I had a similar idea, but definitely totally different and pure genius. You see, I realized that like source code, we could be sharing the "source code" of food preparations. In order to accomplish this magnificent feat, I invented something called the OSFPRL (Open Sharing of Food Preparation Recipe License). Now, because of this amazing invention, people can share these 'recipes'.
Don't think about stealing my idea though, I've patented it. It's all mine!
Re:87% of the US Population can uniquely identifie by Anonymous Coward · 2017-10-28 13:51 · Score: 0

Yeah, that totally happened.
Data licenses can be tricky by imidan · 2017-10-28 19:01 · Score: 2

I see several comments already implying that open licenses specifically for data are unnecessary because we already have free open source licenses for code, but they're not the same.
Most of our open source licenses use copyright law as their foundations. Different legal systems treat the idea of copyrighting facts somewhat differently, but in the US, facts aren't copyrightable. That means that trying to apply a FOSS license to data can be fraught--how can the copyright-based license apply if data aren't copyrightable?.
In the EU, facts aren't copyrightable, but "databases" are--where a "database" is a collection of data that has had value added by efforts to organize the data, for example (they call this "sui generis").
How do you deal with jurisdictional incompatibilities? Good open data licenses spell out the solutions to these conflicts. I see nothing about them in the CDLA. That alone would make me extremely hesitant to try to use it on any data product I publish.
1. Re:Data licenses can be tricky by Anonymous Coward · 2017-10-29 16:20 · Score: 0
  
  Yes, copyright law is what gives legal weight to software licenses like MIT, BSD, GPL, etc and to content licenses like Creative Commons. It is what makes them enforceable because the owner of the copyright can assert those rights I don't see any equivalent underpinning in these licenses.
  More to the point I can't quite see the problem they are trying to solve. What is the problematic scenario they address?
Depends on what the data is by Anonymous Coward · 2017-10-29 00:28 · Score: 0

The issue with data is it can likely originate from a much larger number of sources than source code. In development you might have a few people contributing code to a project and all agree to the open source licensing but with data it might be from hundreds of sources. You would need to either have a means of getting approval from everyone who contributed or provide a means of people opting out of providing their data to your set before you could open it up.
Ask Equifax by grumpy-cowboy · 2017-10-29 01:10 · Score: 1

They already provides any information you want freely. ;)

--
Will $CURRENT_YEAR be the year of the Linux Desktop?
Good idea but not new by Anonymous Coward · 2017-10-29 03:45 · Score: 0

Never heard of open street maps?