Crowd-Sourced Experiment To Map All Human Skills

← Back to Stories (view on slashdot.org)

Crowd-Sourced Experiment To Map All Human Skills

Posted by samzenpus on Wednesday November 12, 2014 @10:34AM from the what-can-you-do? dept.

spadadot writes French-based startup has just launched a website that will let you add your skills to a comprehensive map of human skills. As quoted from their website "We aim to build the largest, most accurate, multilingual skills database ever made, by allowing a diverse and skillful community to contribute their individual skills to the global map." The ontology is simple: skills can have zero or more sub-skills. Every new skill is available in all supported languages (only English and French at the moment). The crowdsourced data is free for non-commercial use."

1 of 70 comments (clear)

Min score:

Reason:

Sort:

Wrong structure by swillden · 2014-11-12 13:40 · Score: 4, Insightful

They're trying to model a database of human skills as a hierarchy. That's the most common sort of categorization system we design, because it's simple and logical, but there are lots of things that simply don't fit such a model. Arguably, it's not even a particularly natural model for humans since our internal category systems are generally prototype-based.
But in this case, the real problem is that whatever clear divisions you try to define to segregate skills into classes will be essentially arbitrary. Skills shade into one another based on various common elements. Some pairs of skills are deeply similar because they involve the same sorts of processes, so a person who knows one can easily learn the other even if they're used in completely different contexts, so the taxonomy as-is will incorrectly separate them. Ideally, you really want a skill map that identifies skills that have high degrees of similarity, and between which people can transition easily, regardless of context (I suppose I'm presuming an application of the map which may not be intended, but it seems like a pretty darned valuable application).
There are also real issues of granularity. Take C++ programming... you can be a competent programmer without knowing anything about template metaprogramming, and you can be an expert metaprogrammer without being able to write useful code. Think about it for a moment and you can come up with a hundred examples of sub-skills for any skill. Of course, you can just decide to arbitrarily cut it off at a particular level, and sometimes that level is obvious... but I have a strong suspicion that different people will disagree on the where those "obvious" cut-offs are.
Building the data up the ad-hoc way they're going about it is going to lead to lots of other strangenesses. For example, right now under "Technology" there are three categories "Computer Science", "Aerospace" and "Engineering". Umm, what? We can argue about whether or not software engineers are real engineers, but aerospace engineers definitely are. Do those three things really belong at the same level? Clearly not, and no individual taxonomist would put them there. I hope they have some way for the crowd (or someone) to restructure or the inevitably-flawed and inconsistent hierarchical taxonomy is also going to be silly.
I'm not saying that their idea is impossible, I'm saying that it doesn't fit within a structure of classical categories. Instead it should be modeled as a graph, with multiple relationships between nodes, and the edges labeled to indicate the nature of the relationship. Of course, this will make it impossible to find a skill in the graph except by searching, but that's going to be the case anyway. Except in the most obvious cases people won't know which branches of the tree to follow to find a given skill, and if you're going to start by searching anyway a graph facilitates finding what you want, because you can search for something related and then from there navigate to precisely what you wanted (assuming it's present and properly-connected).
I think there'd also be a lot of value in jump-starting (or perhaps refining) crowd-sourced data with automated analysis and clustering, derived from relevant documents. But the approach to collecting and building the data is less important than getting the data model right.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.