Slashdot Mirror


The Next Big Step For Wikidata: Forming a Hub For Researchers

The ed17 writes Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects, is gearing up to submit a grant application to the EU that would expand Wikidata's scope by developing it as a science hub. ... This proposal is significant because no other open collaborative project ... can connect the free databases in the world across disciplinary and linguistic boundaries. ...the project will be capable of providing a unique open service: for the first time, that will allow both citizens and professional scientists from any research or language community to integrate their databases into an open global structure, to publicly annotate, verify, criticize and improve the quality of available data, to define its limits, to contribute to the evolution of its ontology, and to make all this available to everyone, without any restrictions on use and reuse.

61 comments

  1. Jimmy Wales On Crack Again by Frosty+Piss · · Score: 1, Insightful

    Folks, Wikipedia is a starting place, but its ever-changing content contributed by whoever is not acceptable for academic references. This has been discussed before.

    No one references Encyclopedia Britannica in their Masters Thesis...

    --
    If you want news from today, you have to come back tomorrow.
    1. Re:Jimmy Wales On Crack Again by Livius · · Score: 2

      But this is about Wikidata.

    2. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      Troll

    3. Re:Jimmy Wales On Crack Again by Frosty+Piss · · Score: 1

      But this is about Wikidata.

      Yes, and my WordPress blog has an underlieing database, too.

      --
      If you want news from today, you have to come back tomorrow.
    4. Re:Jimmy Wales On Crack Again by Frosty+Piss · · Score: 0

      Troll

      Not "troll", Wikidrone, valid point.

      Sure there is lots of data in Wikimedia's database, much if not all taken from the Internet. There may be many things we can learn from an ever changing database of random information. I see numbers of Masters and PHD thesis that could come from the traffic analysis alone. But to say that Wikipedia has some special giant trove of human knowledge that is not possible from an analysis of the actual sources available on the Internet, well, I just wonder what WikiWorld you live in.

      You must have Admin credentials...

      --
      If you want news from today, you have to come back tomorrow.
    5. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 3, Insightful

      Put the pipe down, bro. Pointing out the obvious is cool and all, but kinda OT in this case.

      In case the room was too smokey to see your screen properly - from TFA; "...would expand Wikidata's scope by developing it as a science hub. The proposal, supported by more than 25 volunteers and half a dozen European institutions as project partners, aims to create a virtual research environment (VRE) that will enhance the project's capacity for freely sharing scientific data."

      We're not talking about wikipedia, but something new that uses wikidata as it's core.

    6. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      The newer guys in my department not only didn't reference Wikipedia for the images they used in their talk while giving a talk about their research in front of the entire department before being hired, but they all used the same images from wikipedia (in the lasty 8 years, my department hired 5 guys in same area). Not that it mattered because the old pharts in the department don't even bother with wikipedia.

      (I swear I saw the same exact catchpa (consent) before on this website. Anybody else?)

    7. Re:Jimmy Wales On Crack Again by martin-boundary · · Score: 1

      [...] and to make all this available to everyone, without any restrictions on use and reuse.

      The fundamental problem remains, however. Even if scientists curate the data honestly and comprehensively, what's to stop people from taking the material, editing/changing it, and publishing/claiming their version is correct? The only way to protect against this is to make the data read-only downstream, eg only credentialed scientists will get to create or modify data - and that's a pretty fundamental restriction on use and reuse.

      Basically, the idea seems contradictory.

    8. Re: Jimmy Wales On Crack Again by Frosty+Piss · · Score: 0

      Right now, except for obscure articles, you can't edit without getting reverted because of article "ownership" by some "in" editor or an Admin. Oh sure, you can correct bad spelling or grammar, but don't go beyond that without permission of the article "owner".

      If you want to go to the model of locked down and edited by professionals, there is a encyclopedia that has been around for a long time that works that way.

      --
      If you want news from today, you have to come back tomorrow.
    9. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      It's easy to put a reference on the article to the wikidata page. Very few articles already provide a (sometimes broken) reference to the data.
      Anyone can put random data somewhere and claim theirs is the right one, but if the article doesn't point to it they wouldn't fool many.

      It's cool to hate on Jimmy, but he hasn't done anything obviously wrong here yet. Give the man some time.

    10. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      Cock-Wad,

      I think it's about using their collaboration infrastructure and not its current content.

      Best.

    11. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      Just because you are on crack, doesn't mean everyone else is on crack.

    12. Re:Jimmy Wales On Crack Again by Bite+The+Pillow · · Score: 1

      Just read the fucking article this time. I was going to select bits to quote for you, but honestly either you are deliberately misunderstanding or don't possess the reading skills to make sense of words.

      Have someone read it to you, and ask them to use small words if that helps.

      This is not related to Wikipedia, aka "The encyclopedia that like 12 friends of Jimbo can edit" except in that special way where you are related to any random-ass member of humanity.

    13. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      From the summary:

      > Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects

      Gee, that's "nothing to do with Wikipedia". Not a bit, nope, not even a teensy bit. And "gasohol" has nothing to do with petroleum, didn't you hear? It's corn!!!

    14. Re:Jimmy Wales On Crack Again by Anonymous Coward · · Score: 0

      their collaboration infrastructure

      where a message forum is implemented as a textarea where you have to write in markup and can change what everybody else wrote? I don't think so.

  2. The fix is in by Anonymous Coward · · Score: 0

    That's it. They're going to give all editorial power to academe. The end result will inevitably be the standard statist multiculty world view uniformly applied.

  3. Researchers? by Anonymous Coward · · Score: 0

    [Original research? - Scheduled for deletion]

  4. I object. by Anonymous Coward · · Score: 0

    Data is racist.

    1. Re:I object. by martin-boundary · · Score: 1

      Data is racist.

      I agree. Just last week I was interviewing for an engineering job with a Nascar team, and all they could talk about was fuel data this, weight data that, etc. I told them that's not how I roll. I'm writing my congresscritter right now to stop this despicable behavior.

  5. I already do that by Paxinum · · Score: 0

    Whenever I find something useful in my research, I try to add it (and reference) to wikipedia. I sort of use it as a "personal" notebook.

  6. EU grant by manu0601 · · Score: 1

    I wish them good luck to them for the EU grant procedure. The procedures are such a maze that usually EU grant experts are required.

    1. Re:EU grant by Anonymous Coward · · Score: 0

      You realize that's the point of the EU grant system, right? Who do you think administers the grants?

    2. Re:EU grant by JanneM · · Score: 3, Interesting

      They have four partner universities and several other research institutions, most or all of who already have one or more full-time staff dedicated to help projects with their grant application process.

      Yes, EU grant applications are big and cumbersone - though the payoff is commensurate - but the process is not going to be the main hurdle. With all the available expertise at their disposal, if they can't navigate the application process then they're unlikely to successfully steer a major project over several years either.

      --
      Trust the Computer. The Computer is your friend.
  7. oh great, empire building at wikipedia by Anonymous Coward · · Score: 2, Interesting

    wikipedia started out as a web site where volunteers could edit articles, before entry into the nupedia website. nupedia is now dead. wikipedia has been engaging in bigger fund raising drives, and has more paid employees. Now it is trying to do more stuff to justify those more employees, just like when wikipedia spent a bunch of money trying to develop better wikipedia page editing software. I bet the heads of wikipedia now have bigger salaries.

    I would just like the number of humans maintaining wikipedia to be small once again, and not try to do anything else.

    1. Re:oh great, empire building at wikipedia by Frosty+Piss · · Score: 0, Troll

      wikipedia started out as a web site where volunteers could edit articles, before entry into the nupedia website. nupedia is now dead. wikipedia has been engaging in bigger fund raising drives, and has more paid employees. Now it is trying to do more stuff to justify those more employees, just like when wikipedia spent a bunch of money trying to develop better wikipedia page editing software. I bet the heads of wikipedia now have bigger salaries.

      I would just like the number of humans maintaining wikipedia to be small once again, and not try to do anything else

      Jimmy and his inner circle need to be able fly places and do things. Jimmy lives in London, and, you know, needs to pay the rent and hold up the lifestyle. Again, frist class airline tickets are expensive. Get the lead out, dude...

      --
      If you want news from today, you have to come back tomorrow.
    2. Re:oh great, empire building at wikipedia by DerekLyons · · Score: 1

      I would just like the number of humans maintaining wikipedia to be small once again

      The number of humans editing Wikipedia is small - they've driven everyone else off.

  8. So basically....the internet in the 80s... by Anonymous Coward · · Score: 0

    lol.....

  9. Mod parent DOWN by Anonymous Coward · · Score: 1

    Haters gonna hate. That doesnt mean their not stupid though. Get a grip. Wikipedia is an invaluable reference.

  10. Editable scientific data? by Nutria · · Score: 3, Insightful

    I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:Editable scientific data? by ranton · · Score: 5, Insightful

      I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

      Um, have you never heard of versioning? It would be pretty trivial to add the statement "Used the XXX v3.5.1 dataset to perform these calculations" to your research paper.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    2. Re:Editable scientific data? by Nutria · · Score: 1

      (1) Wikidata would either have to keep (many) multiple copies of possibly quite large data sets, or keep diffs. How much of a strain does it put on a busy server to generate a dataset from a huge original and lots of large diffs.

      (2) Not too many people pay attention to Wikipedia changelogs. If only the current form of the data is easily visible, that's what most people -- especially amateurs and those with political motivations -- will use.

      --
      "I don't know, therefore Aliens" Wafflebox1
    3. Re:Editable scientific data? by ranton · · Score: 1

      Wikidata would either have to keep (many) multiple copies of possibly quite large data sets, or keep diffs. How much of a strain does it put on a busy server to generate a dataset from a huge original and lots of large diffs.

      First off none of the problems you list are unmanageable; they just make it more expensive and more difficult to design. One technique could be to only store data sets from published papers. Versions cited in published papers and the latest data will be the ones most frequently accessed, and all other versions could be handled with diffs. They may even decide to only use diffs, but keep track of which versions are most frequently downloaded and store them in full. There are many more ways they could architect the system to handle these issues.

      Not too many people pay attention to Wikipedia changelogs. If only the current form of the data is easily visible, that's what most people -- especially amateurs and those with political motivations -- will use.

      It is perfectly fine for most people to use the most recent data. If someone is trying to build on my research, I would want them to take advantage of any new data that has been added since I did my work. Perhaps the new data invalidates my results. My version of the data could then be used to determine if my methodology was poor, or if it really was the new data that showed my findings were invalid.

      And on top of that, holding back new research tools because amateurs and politically motivated groups could misuse them is very scary indeed.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    4. Re:Editable scientific data? by DerekLyons · · Score: 1

      I can't be the only one who thinks that is a terribly bad idea... It would rip the guts right out of repeatability, and confidence that "this" is what $RESEARCHER found.

      Um, have you never heard of versioning? It would be pretty trivial to add the statement "Used the XXX v3.5.1 dataset to perform these calculations" to your research paper.

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct. That's the basic flaw in Wikipedia, and one that must be fixed in order to use a system like it as a depository for scientific information rather than a hazy collection of stuff that one guy maintains is accurate in his opinion or that nobody can be bothered to maintain correct and up-to-date.

    5. Re:Editable scientific data? by Nutria · · Score: 1

      holding back new research tools because amateurs and politically motivated groups could misuse them is very scary indeed.

      An analogy: we hold back guns from four year olds -- even when we show it to them and say, "Very dangerous! Never touch!", but not from legally competent adults; when said four year old gets his hands on a gun, bad things can happen.

      Likewise, we should not hold back *copies* of data from the world. However, so as to protect the "chain of provenance", edit privileges should be limited in some way, so as to prevent abuse by sock puppets and the anonymous. Maybe something as simple as requiring editors to log in using a cryptographic certificate signed by a trusted third party which requires some form of official ID and manual verification.

      --
      "I don't know, therefore Aliens" Wafflebox1
    6. Re:Editable scientific data? by khallow · · Score: 1

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct.

      Repeatability was exactly the concern addressed. Having said that, one key difference appears to be that the data is just dumped rather than interpreted. That particular version isn't going to become more or less correct and complete just because my sock puppet army is at work.

      And what really can or should a content management system do here to verify correctness and completeness? I think the original insistence on repeatability is precisely because completeness and correctness is a hard problem beyond the scope of a content management system like Wikidata.

      I consider this much like Arxiv.org, the pre-print server where papers are dumped without evaluation of how scientifically valid they are. I believe the filter is that you have to either be active (submitting papers to the Arxiv on a regular basis) or referred to by someone who already has access. That simple process doesn't keep all of the crap off, but it does greatly improve the signal to noise.

    7. Re:Editable scientific data? by dkf · · Score: 1

      Versioning only ensures that anyone who subsequently performs the calculations will reach the same result - it does not verify the data is complete or correct.

      Nothing much ensures that the data is complete or correct now either, other than peer review over a long period of time by people who are wholly unconnected with the original work (and its funding). In fact, in some sciences you're not going to get complete data in a public venue anyway (some sciences work with data that in raw form can identify individual people; think medical research). Correctness is hard to evaluate; what does it even mean for raw data in the first place?

      But keeping versioned data does help with some types of analysis, such as working out whether a scientist's hypothesis was reasonable based on what data was available at the time, and whether that hypothesis still holds water or when it ceased to be good. It also makes it much easier to detect fraud, and you can use all the sorts of concepts developed for distributed source code management to make it all more comprehensible.

      Don't think "wikipedia for scientific data", think "github for scientific data". That's a much better model.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
  11. 1 Millionth User? by Mikkeles · · Score: 3, Funny

    Congratulations! You are the one-millionth user to log into our system. If there's anything special we can do for you, anything at all, don't hesitate to ask!

    I want no Beta, 15 mod points per day, and a pony!

    --
    Great minds think alike; fools seldom differ.
    1. Re:1 Millionth User? by martin-boundary · · Score: 2

      No, I'm the millionth user to log in. It says so right here on my screen! *I* want this imposter whipped with a pussy willow!

    2. Re:1 Millionth User? by david.given · · Score: 1

      Would the pony also have mod points?

  12. Re:April Fools! by Anonymous Coward · · Score: 0

    Talking about "Wiki" is kind of like talking about "Blog". I used to read Blog, but now they hardly update it anymore.

  13. Give it a chance by Okian+Warrior · · Score: 5, Insightful

    I can't be the only one who thinks that is a terribly bad idea...

    When I first heard about wikipedia and the theory driving it I thought it was a terribly bad idea at the time... but ya know, I find it really useful. It's got lots of problems but on balance it's s lot more useful than problematic.

    We've identified many deep problems with scientific research on this very forum, and to my knowledge little progress has been made over the last decade.

    Can't we at least *try* different solutions?

    Where is it written(*) that the old ways are the best?

    (*) The script to Skyfall of course. I got that from Wikiquotes.

    1. Re:Give it a chance by Nutria · · Score: 1

      We've identified many deep problems with scientific research on this very forum

      Most revolving around laziness and academic corruption. Allowing data (for example: historical weather gauge readings, or IQ scores, or any other data having to do with hot-button topics) to be edited is an invitation to socio-political fraud on an unheard-of scale.

      --
      "I don't know, therefore Aliens" Wafflebox1
  14. How is the parent not true? by Frosty+Piss · · Score: 0

    How is the parent not true?

    --
    If you want news from today, you have to come back tomorrow.
  15. Mod parent up by Anonymous Coward · · Score: 0

    Agreed. Mod the guy who says to mod down up, but mod that post's parent down

    1. Re:Mod parent up by Anonymous Coward · · Score: 0

      Mod the guy who says to mod the guy who says to mod down up, but mod that post's parent down up, mod the guy who says to mod down up, but mod that post's parent down.

  16. What's an 'underlieing' ? by Anonymous Coward · · Score: 0

    ... an underlieing database ...

    Pardon moi, English is not my mother tongue

    Can someone tell me what an 'underlieing' is, please?

    Merci !

  17. I can see it now... by Anonymous Coward · · Score: 0

    A disaster in the making, if the shoddy editing efforts of the main Wikipedia pages are anything to go by (Those hundreds of pics that had Hitler edited in and weren't discovered for how long?).

    I can see now what is going to happen with little power-trippers carving out their niches (just like now), edit wars, malicious inserts of false information, and probably more than a bit of it will be subject to SJW-Social Darwinism (STEM is evil!).

    I can also see this entire thing clashing with the Wikimedia "no original research" mantra when it comes to articles and their other information services. Good fucking luck.

  18. Yep. by Niet3sche · · Score: 1

    I and a co-author pitched this notion in 2006. We had pitched it as a smaller element of a "research match-maker" idea. And, man, were the academics violently opposed. No one saw value in the work and most felt either directly threatened or otherwise unsure how to objectively gauge the value of the contribution with author name and affiliation removed. It was depressing.

  19. Every... single... page by Anonymous Coward · · Score: 0

    This article possibly contains original research. Relevant discussion may be found on the talk page. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (January 2015)

    1. Re:Every... single... page by Anonymous Coward · · Score: 0

      This article may meet Wikipedia's criteria for speedy deletion, due to original research. If this article does not meet the criteria for speedy deletion, or you intend to fix it, please remove this notice, but do not remove this notice from pages that you have created yourself.

  20. missing words by Anonymous Coward · · Score: 0

    to publicly annotate, verify, criticize, improve, "track and censor" the available data.

  21. Wikidata, not wikiPEDIA by vikingpower · · Score: 1

    Most of the commenters here did not even bother to catch that difference. RTFA, folks.

    --
    Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    1. Re:Wikidata, not wikiPEDIA by Anonymous Coward · · Score: 0

      WikiMEDIA owns Wikidata AND wikipedia.

      Try reading about the involved companies instead of trying to just work from TFA.

  22. And then the grant ran out... by MakerDusk · · Score: 2

    The main problem with scientific data is retention. Often the results are kept, but the data that led to the results is long lost. Even 5 years later, it's hard to find the data. There is a reason for this: there's a lot! Regardless of what their database size, most particle physics experiments can fill it in less than a day. It's not technologically feasible to gather the information into one system, at our current level of technology.

    While wikipedia has editing and flame wars problems, this project would end with similar problems surrounding deletion. What do you keep? How do you know where the break throughs will be made: the ones that make revisiting old experiments and data necessary? One cannot predict the path inspiration will take. Who decides what gets deleted: an editor, an admin, by public vote? This is what will cause the project to fail out of the starting gate. In the event they do succeed, what happens when their funding runs out? We've already established that the main problem is from too much data for practical backup... that only leaves the inevitable fall into oblivion.

    In closing, I do offer a ray of hope: the time is fast approaching when we will reach the prerequisite technological level. Take a look at the work HP is currently doing: http://www.engadget.com/2010/0... This technology, at the optimal level, (I crunched some numbers, and it definitely would not be the case with the first iteration) can store all the world's data, and then some, on a device the size of a garbage can. At that point deletion, and all the problems outlined above, become nullified. Until we reach that level, this is a pipe dream, doomed to fail in a quagmire of politics.

  23. Already done, sorta by ausekilis · · Score: 1

    This is what the Texas Digital Library aims to do. Though it's not quite one big wiki, it actually is a push to archive and collaborate using various data types and formats.

  24. Like Jordi on Startrek? by wellsdm · · Score: 1

    Is this kind of like the research database that Jordi can access (and update) with his tablet on star trek?

  25. Re:April Fools! by Anonymous Coward · · Score: 0

    I still enjoy reading slash from time to time.