Ask Slashdot: How To Both Mirror and Protect Crowdsourced Data?
New submitter cellurl writes "I run wikispeedia, a database of speed limit signs. People approach us to mirror our data, but I am quite certain it will become a one-way street. So my question is: How can I give consumers peace of mind in using our data and not give up the ship? We want to be the clearing house for this information, at the same time following our charter of providing safety. Some thoughts that come to mind are creating a 'Service Level Agreement' which they will no doubt reject, or MySQL-clustering, or rsync. Any thoughts, (technically, logistically, legally) appreciated."
You'll only be THE clearing house if you are the best source. Second, it's public data, stop trying to own it, you can't, it's not yours to own in the first place.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
There are plenty of publicly accessible sites that mirror data from trivial to critical. I would contact a few of them and see what agreements they have in place, if any.
I would think you would want to make sure they note their data is a mirror, and that updates should be sent to your site. That might be handled by doc files for each file, or some type of about file in each directory. You probably want something like that if for no other reason so as to note metadata.
I've seen quite a few sites that prefer that you go to a mirror to download actual data.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Consider teaming up with a seasoned negotiator with good business sense, or hiring an attorney -- or both. If there is any value in your dataset, those who got in touch with you will not reject fees, SLA's, reciprocal updates, etc. It all depends on how much data you have, and how accurate it is.
On a separate note: your site is disfunctional on my tablet. I'm left wondering what it's about or how it's supposed to work.
create an API and provide an interface where your client base can interface with the data.
there are a lot of places out there that does this, as its considered Intellectual Property.
It's not a typo if you understood the meaning!
Might not translate exactly, but look into how the openbsd project mirrors their stuff. There is the main site, tons of mirrors. Everything is hashed. Grab a mirror, if you don't trust it get the hashes from the main site and check the files. Not sure if it would scale to what you're doing. And what do you mean by 'giving up the ship' exactly?
License the mirroring only in the event that:
1. It's visibly acknowledged that you are the source site
2. updates are either directly sent to you, or are sent to you by the other site within a time limit
3. All content on your site, including that sent to you by another (mirror) site, be watermarked as belong to your site. For pictures, this would be a visible watermark on the picture.
Be the best Make all information free Choose a good licence Expect to be taken over one day from something better, when that comes along ... help them
Make it easy for anybody to use your information
It is counterintuitive but the moment you put up protective barriers, you fall over. The moment you depend on an artificial barrier to protect your lead is the moment you will degrade the quality of your product. Happens every time on products and services that grow on openness and suddenly feel the reason they are good is more so because of their qualities than the openness.
If you develop a product/service based on a closed environment, that is a different story. It makes business sense to improve your model based on a closed environment until a disruptive product/service comes along.
This is a compilation of public data, with the legwork being done by others. You've got no real legal option in protecting the data, at least in regards to the US. You could perhaps try some technical means of controlling the data, but that would greatly reduce the utility. I would also consider in unethical to try and 'own' the results of work done by other unpaid volunteers. If you wish to be the center of this data collection, than make it as useful as possible.
This is my signature. There are many like it, but this one is mine.
You want to "Protect...Data", "not give up the ship", "follow...our charter of providing safety". But what is it that you don't what mirrors to do with the data? Less verbiage, more clarity, please.
If it's safety you want, I don't understand why you are trying to get other sites to freely back up your data.
Get a real backup service and tell people how it's backed up, poof! safety.
Or if you want to make a community resource you can do like sourceforge, ibiblio, etc, free mirrors that point back to your site.
You don't want to mandate people give you data. That will just get you bad data. Instead, make it as easy as possible for them to do - APIs, easy web forms, any method you can think of that will make the barrier to entry as low as possible. Encourage them to use it, but relax and set your data Free and don't try to force it. It's like Wikis... Somehow it works out OK.
Fully Homomorphic Encryption. FHE. See http://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_homomorphic_encryption
From the home page "the sign you capture is copyrighted with your name since you found it".
How on earth can you copyright a speed sign, and even if you could, how can that copyright be relevant to anything?
The location and speed limit of a speed sign is a fact. How can that be copyrighted? How can it limit the rights of others who observer the sign to publish its location and speed limit?
If anybody were entitled to copyright a speed sign, it would be the authorities that put it there and who actually own it. How can the location of other peoples property be copyrightable? Looks like somebody took the concept one step too far...
Gets a thousand years of bad karma.
If you were blocking sigs, you wouldn't have to read this.
I've been working with phone directories for a few decades, where many companies are in basically the same position that you are - making a living from public information. Most data is collected from phone companies that dump their customer databases to the phone directory companies. This process and the associated tariffs are regulated by law. This data must be processed and cleaned up before it is passed on. Then there are data consumers - in the old days these were people reading the phone books. These days, data consumers are people browsing the web and all sorts of web apps that connect to the phone book through one of several apis. Most telephone directory companies provide search apis for their databases - usually not for free. Everything is a one-way street, of course. Information flows downstream, money goes upstream. No phone directory company that I know of will voluntarily mirror their database to anyone. Search APIs, yes. Mirrors, no. Phone directories are sometimes distributed to consumers and businesses on cd/dvd, but never without at least an attempt at scrambling and restricting its usage. You could probably make a business for a while selling an open, mirrored copy of your database. People will pay for subscriptions. The problem is, any one of your customers could choose become your competitor at any time. The more successful you are the more likely someone is to do that. Maybe you can protect yourself legally, but most people prefer to lock their door even in jurisdictions where trespassing is forbidden. Competition in your area would be nice for everyone else, your customers as well as your competitors, so as a member of "everyone else" I should say go for it. But you're no dummy. You got your company name posted on slashdot after all!
I may be wrong, as the OP didn't mention budget!
However looking at their site, I'm guessing they're desperate to keep costs to an absolute minimum - correct me if I'm wrong (please), I think the S3 would be potentially quite expensive?
I *think* the OP is looking for crowd-source solutions, i.e. a way for people to run mirrors themselves whilst maintaining integrity and copyright(s).
I think therefore I am... a Linux geek.
Just like the GPL but also closes the loophole that allows you to use an open source tool in SAS without giving back. I would investigate this licence. Also, map maker usually put distinctive voluntary mistakes in their maps to prove when data has been copied.
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
You could fix that first.
OpenStreetMap
Extra alerts at the right times are useful.
I've seen signs disappear because someone ran over them.
I've seen signs disappear because kids stole them.
I've seen trees grow around signs and obstruct the view of the signs.
These are DoT issues that should resolved ASAP, but until then it might be useful to know that the 45mph limit dropped to 25 suddenly due to being very near a school for the blind and oh, by the way, the untrimmed bushes grew over the sign. No locals bother to report it because they know the speed limit. My out of town behind on the other hand has no clue because the sign is behind a tree and I don't know I need to report it!
This isn't a sign for telling you where you can avoid speed traps or something, this is a site simply republishing public data in an electronic format.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager