Detecting Conflict-Of-Interest on the Semantic Web
CexpTretical writes "At the 15th International WWW Conference in Edinburgh Scotland, Refereed Track on Semantic Web accepted many thorough and interesting academic papers on semantic web research on subjects related to where the Web is in the Semantic Web?
One such paper nominated for Best Paper Award, Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection hits on the whole subject of validation and/or verification in the brave world of so called "Web 3.0" topologies/frameworks/architectures.
The paper describes a "Semantic Web application that detects Conflict of Interest (COI) relationships"."
This is an excellent paper that highlights many of the issues that will be encountered as the naive realists promoting the semantic web hit the hard fact that data quality is poor and identitification is hard. From the paper's conclusions:
The goal of full/complete automation is some years away. Currently, quality and availability of data is often a key challenge given the limited number of high quality and useful data sources. Significant work is required in certain tasks, such as entity disambiguation.
As a practical tool the Semantic Web has all of the problems that no-fly lists have. People share names with each other and one individual may appear under multiple names. Datasets are radically incomplete, and an awareness of the possible uses to which data may be put will encourage the less scrupulous amongst us to deliberatly devalue datasets by including misleading or incomplete information.
Even without deliberate poisoning of the data, it is doubtful that standard vocabularies will be used in sufficiently consistent ways by various institutions and individuals to create homogenous (and therefore useful) datasets. For example, people who do multi-centre cancer trials expend an enormous amount of energy on data curation and auditing, which includes actual site visits to institutions and periodic audits of data, as well as centralized control of what gets into the final database. And this is for data collected by cancer centers and cancer docs who are nominally committed to following precise protocols and have been given training in what the fields in the various forms are supposed to mean. Yet centres can and do get delisted from studies due to lack of compliance.
The same thing can be seen in nominally standardized data formats like MAGE-ML and its cousins: industry-standard XML-based languages for marking up genomic datasets. There are specific elements that are intended for particular pieces of data, but a depressing amount of the time companies decide to put the really important stuff in a catch-all element, because "it's easier" than understanding the well-documented and clearly defined format.
Likewise, medical images created in DICOM format by major equipment manufacturers not infrequently have clear and blatant violations of the DICOM standard, despite over a decade of effort to ensure a reasonable level of compliance. And these are not subtle violations, but missing required fields, or incorrect data in required fields ("because all our images are 512x512 why should we have to fill in the width and height all the time? It's easier to just leave them zero.")
People are stupid and lazy. I know I am. And we use the same words to mean different things, and different words to mean the same thing. The Semantic Web requires people to be smart and hardworking, and to use standardized vocabularies in standardized ways. Decades of failed or at best partially successful data exchange protocols strongly suggest that these requirements will not be fulfilled.
Blasphemy is a human right. Blasphemophobia kills.
People are stupid and lazy. I know I am. And we use the same words to mean different things, and different words to mean the same thing. The Semantic Web requires people to be smart and hardworking, and to use standardized vocabularies in standardized ways. Decades of failed or at best partially successful data exchange protocols strongly suggest that these requirements will not be fulfilled.
_ State_Of_The_Arti c_MediaWikit ic_Wikipedia
A quite standardized vocabulary actually exist in Wikipedia (markup language, templates, categories).
Here is a list of links that try to combine wikipedia and the semantic web:
http://wiki.ontoworld.org/index.php/Semantic_Wiki
http://wiki.ontoworld.org/wiki/Sites_using_Semant
http://en.wikipedia.org/wiki/Wikipedia_talk:Seman
http://www2006.org/programme/item.php?id=4039
http://meta.wikimedia.org/wiki/Semantic_MediaWiki
It's all very well these hucksters peddling the semantic web to funding bodies who don't know any better, as long as they don't start pretending it's anything other than the new FIPA - a collection of committees generating specifications that the world will continue to ignore.