Slashdot Mirror


The Next Big Step For Wikidata: Forming a Hub For Researchers

The ed17 writes Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects, is gearing up to submit a grant application to the EU that would expand Wikidata's scope by developing it as a science hub. ... This proposal is significant because no other open collaborative project ... can connect the free databases in the world across disciplinary and linguistic boundaries. ...the project will be capable of providing a unique open service: for the first time, that will allow both citizens and professional scientists from any research or language community to integrate their databases into an open global structure, to publicly annotate, verify, criticize and improve the quality of available data, to define its limits, to contribute to the evolution of its ontology, and to make all this available to everyone, without any restrictions on use and reuse.

30 of 61 comments (clear)

  1. Jimmy Wales On Crack Again by Frosty+Piss · · Score: 1, Insightful

    Folks, Wikipedia is a starting place, but its ever-changing content contributed by whoever is not acceptable for academic references. This has been discussed before.

    No one references Encyclopedia Britannica in their Masters Thesis...

    --
    If you want news from today, you have to come back tomorrow.
    1. Re:Jimmy Wales On Crack Again by Livius · · Score: 2

      But this is about Wikidata.

    2. Re:Jimmy Wales On Crack Again by Frosty+Piss · · Score: 1

      But this is about Wikidata.

      Yes, and my WordPress blog has an underlieing database, too.

      --
      If you want news from today, you have to come back tomorrow.
    3. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 3, Insightful

      Put the pipe down, bro. Pointing out the obvious is cool and all, but kinda OT in this case.

      In case the room was too smokey to see your screen properly - from TFA; "...would expand Wikidata's scope by developing it as a science hub. The proposal, supported by more than 25 volunteers and half a dozen European institutions as project partners, aims to create a virtual research environment (VRE) that will enhance the project's capacity for freely sharing scientific data."

      We're not talking about wikipedia, but something new that uses wikidata as it's core.

    4. Re:Jimmy Wales On Crack Again by martin-boundary · · Score: 1

      [...] and to make all this available to everyone, without any restrictions on use and reuse.

      The fundamental problem remains, however. Even if scientists curate the data honestly and comprehensively, what's to stop people from taking the material, editing/changing it, and publishing/claiming their version is correct? The only way to protect against this is to make the data read-only downstream, eg only credentialed scientists will get to create or modify data - and that's a pretty fundamental restriction on use and reuse.

      Basically, the idea seems contradictory.

    5. Re:Jimmy Wales On Crack Again by Bite+The+Pillow · · Score: 1

      Just read the fucking article this time. I was going to select bits to quote for you, but honestly either you are deliberately misunderstanding or don't possess the reading skills to make sense of words.

      Have someone read it to you, and ask them to use small words if that helps.

      This is not related to Wikipedia, aka "The encyclopedia that like 12 friends of Jimbo can edit" except in that special way where you are related to any random-ass member of humanity.

  2. EU grant by manu0601 · · Score: 1

    I wish them good luck to them for the EU grant procedure. The procedures are such a maze that usually EU grant experts are required.

    1. Re:EU grant by JanneM · · Score: 3, Interesting

      They have four partner universities and several other research institutions, most or all of who already have one or more full-time staff dedicated to help projects with their grant application process.

      Yes, EU grant applications are big and cumbersone - though the payoff is commensurate - but the process is not going to be the main hurdle. With all the available expertise at their disposal, if they can't navigate the application process then they're unlikely to successfully steer a major project over several years either.

      --
      Trust the Computer. The Computer is your friend.
  3. oh great, empire building at wikipedia by Anonymous Coward · · Score: 2, Interesting

    wikipedia started out as a web site where volunteers could edit articles, before entry into the nupedia website. nupedia is now dead. wikipedia has been engaging in bigger fund raising drives, and has more paid employees. Now it is trying to do more stuff to justify those more employees, just like when wikipedia spent a bunch of money trying to develop better wikipedia page editing software. I bet the heads of wikipedia now have bigger salaries.

    I would just like the number of humans maintaining wikipedia to be small once again, and not try to do anything else.

    1. Re:oh great, empire building at wikipedia by DerekLyons · · Score: 1

      I would just like the number of humans maintaining wikipedia to be small once again

      The number of humans editing Wikipedia is small - they've driven everyone else off.

  4. Mod parent DOWN by Anonymous Coward · · Score: 1

    Haters gonna hate. That doesnt mean their not stupid though. Get a grip. Wikipedia is an invaluable reference.

  5. Editable scientific data? by Nutria · · Score: 3, Insightful

    I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:Editable scientific data? by ranton · · Score: 5, Insightful

      I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

      Um, have you never heard of versioning? It would be pretty trivial to add the statement "Used the XXX v3.5.1 dataset to perform these calculations" to your research paper.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    2. Re:Editable scientific data? by Nutria · · Score: 1

      (1) Wikidata would either have to keep (many) multiple copies of possibly quite large data sets, or keep diffs. How much of a strain does it put on a busy server to generate a dataset from a huge original and lots of large diffs.

      (2) Not too many people pay attention to Wikipedia changelogs. If only the current form of the data is easily visible, that's what most people -- especially amateurs and those with political motivations -- will use.

      --
      "I don't know, therefore Aliens" Wafflebox1
    3. Re:Editable scientific data? by ranton · · Score: 1

      Wikidata would either have to keep (many) multiple copies of possibly quite large data sets, or keep diffs. How much of a strain does it put on a busy server to generate a dataset from a huge original and lots of large diffs.

      First off none of the problems you list are unmanageable; they just make it more expensive and more difficult to design. One technique could be to only store data sets from published papers. Versions cited in published papers and the latest data will be the ones most frequently accessed, and all other versions could be handled with diffs. They may even decide to only use diffs, but keep track of which versions are most frequently downloaded and store them in full. There are many more ways they could architect the system to handle these issues.

      Not too many people pay attention to Wikipedia changelogs. If only the current form of the data is easily visible, that's what most people -- especially amateurs and those with political motivations -- will use.

      It is perfectly fine for most people to use the most recent data. If someone is trying to build on my research, I would want them to take advantage of any new data that has been added since I did my work. Perhaps the new data invalidates my results. My version of the data could then be used to determine if my methodology was poor, or if it really was the new data that showed my findings were invalid.

      And on top of that, holding back new research tools because amateurs and politically motivated groups could misuse them is very scary indeed.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    4. Re:Editable scientific data? by DerekLyons · · Score: 1

      I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

      Um, have you never heard of versioning? It would be pretty trivial to add the statement "Used the XXX v3.5.1 dataset to perform these calculations" to your research paper.

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct. That's the basic flaw in Wikipedia, and one that must be fixed in order to use a system like it as a depository for scientific information rather than a hazy collection of stuff that one guy maintains is accurate in his opinion or that nobody can be bothered to maintain correct and up-to-date.

    5. Re:Editable scientific data? by Nutria · · Score: 1

      holding back new research tools because amateurs and politically motivated groups could misuse them is very scary indeed.

      An analogy: we hold back guns from four year olds -- even when we show it to them and say, "Very dangerous! Never touch!", but not from legally competent adults; when said four year old gets his hands on a gun, bad things can happen.

      Likewise, we should not hold back *copies* of data from the world. However, so as to protect the "chain of provenance", edit privileges should be limited in some way, so as to prevent abuse by sock puppets and the anonymous. Maybe something as simple as requiring editors to log in using a cryptographic certificate signed by a trusted third party which requires some form of official ID and manual verification.

      --
      "I don't know, therefore Aliens" Wafflebox1
    6. Re:Editable scientific data? by khallow · · Score: 1

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct.

      Repeatability was exactly the concern addressed. Having said that, one key difference appears to be that the data is just dumped rather than interpreted. That particular version isn't going to become more or less correct and complete just because my sock puppet army is at work.

      And what really can or should a content management system do here to verify correctness and completeness? I think the original insistence on repeatability is precisely because completeness and correctness is a hard problem beyond the scope of a content management system like Wikidata.

      I consider this much like Arxiv.org, the pre-print server where papers are dumped without evaluation of how scientifically valid they are. I believe the filter is that you have to either be active (submitting papers to the Arxiv on a regular basis) or referred to by someone who already has access. That simple process doesn't keep all of the crap off, but it does greatly improve the signal to noise.

    7. Re:Editable scientific data? by dkf · · Score: 1

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct.

      Nothing much ensures that the data is complete or correct now either, other than peer review over a long period of time by people who are wholly unconnected with the original work (and its funding). In fact, in some sciences you're not going to get complete data in a public venue anyway (some sciences work with data that in raw form can identify individual people; think medical research). Correctness is hard to evaluate; what does it even mean for raw data in the first place?

      But keeping versioned data does help with some types of analysis, such as working out whether a scientist's hypothesis was reasonable based on what data was available at the time, and whether that hypothesis still holds water or when it ceased to be good. It also makes it much easier to detect fraud, and you can use all the sorts of concepts developed for distributed source code management to make it all more comprehensible.

      Don't think "wikipedia for scientific data", think "github for scientific data". That's a much better model.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
  6. 1 Millionth User? by Mikkeles · · Score: 3, Funny

    Congratulations! You are the one-millionth user to log into our system. If there's anything special we can do for you, anything at all, don't hesitate to ask!

    I want no Beta, 15 mod points per day, and a pony!

    --
    Great minds think alike; fools seldom differ.
    1. Re:1 Millionth User? by martin-boundary · · Score: 2

      No, I'm the millionth user to log in. It says so right here on my screen! *I* want this imposter whipped with a pussy willow!

    2. Re:1 Millionth User? by david.given · · Score: 1

      Would the pony also have mod points?

  7. Re:I object. by martin-boundary · · Score: 1

    Data is racist.

    I agree. Just last week I was interviewing for an engineering job with a Nascar team, and all they could talk about was fuel data this, weight data that, etc. I told them that's not how I roll. I'm writing my congresscritter right now to stop this despicable behavior.

  8. Give it a chance by Okian+Warrior · · Score: 5, Insightful

    I can't be the only one who thinks that is a terribly bad idea...

    When I first heard about wikipedia and the theory driving it I thought it was a terribly bad idea at the time... but ya know, I find it really useful. It's got lots of problems but on balance it's s lot more useful than problematic.

    We've identified many deep problems with scientific research on this very forum, and to my knowledge little progress has been made over the last decade.

    Can't we at least *try* different solutions?

    Where is it written(*) that the old ways are the best?

    (*) The script to Skyfall of course. I got that from Wikiquotes.

    1. Re:Give it a chance by Nutria · · Score: 1

      We've identified many deep problems with scientific research on this very forum

      Most revolving around laziness and academic corruption. Allowing data (for example: historical weather gauge readings, or IQ scores, or any other data having to do with hot-button topics) to be edited is an invitation to socio-political fraud on an unheard-of scale.

      --
      "I don't know, therefore Aliens" Wafflebox1
  9. Yep. by Niet3sche · · Score: 1

    I and a co-author pitched this notion in 2006. We had pitched it as a smaller element of a "research match-maker" idea. And, man, were the academics violently opposed. No one saw value in the work and most felt either directly threatened or otherwise unsure how to objectively gauge the value of the contribution with author name and affiliation removed. It was depressing.

  10. Wikidata, not wikiPEDIA by vikingpower · · Score: 1

    Most of the commenters here did not even bother to catch that difference. RTFA, folks.

    --
    Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
  11. And then the grant ran out... by MakerDusk · · Score: 2

    The main problem with scientific data is retention. Often the results are kept, but the data that led to the results is long lost. Even 5 years later, it's hard to find the data. There is a reason for this: there's a lot! Regardless of what their database size, most particle physics experiments can fill it in less than a day. It's not technologically feasible to gather the information into one system, at our current level of technology.

    While wikipedia has editing and flame wars problems, this project would end with similar problems surrounding deletion. What do you keep? How do you know where the break throughs will be made: the ones that make revisiting old experiments and data necessary? One cannot predict the path inspiration will take. Who decides what gets deleted: an editor, an admin, by public vote? This is what will cause the project to fail out of the starting gate. In the event they do succeed, what happens when their funding runs out? We've already established that the main problem is from too much data for practical backup... that only leaves the inevitable fall into oblivion.

    In closing, I do offer a ray of hope: the time is fast approaching when we will reach the prerequisite technological level. Take a look at the work HP is currently doing: http://www.engadget.com/2010/0... This technology, at the optimal level, (I crunched some numbers, and it definitely would not be the case with the first iteration) can store all the world's data, and then some, on a device the size of a garbage can. At that point deletion, and all the problems outlined above, become nullified. Until we reach that level, this is a pipe dream, doomed to fail in a quagmire of politics.

  12. Already done, sorta by ausekilis · · Score: 1

    This is what the Texas Digital Library aims to do. Though it's not quite one big wiki, it actually is a push to archive and collaborate using various data types and formats.

  13. Like Jordi on Startrek? by wellsdm · · Score: 1

    Is this kind of like the research database that Jordi can access (and update) with his tablet on star trek?