Slashdot Mirror


Exploring the Relationships Between Tech Skills (Visualization)

Nerval's Lobster writes: Simon Hughes, Dice's Chief Data Scientist, has put together an experimental visualization that explores how tech skills relate to one another. In the visualization, every circle or node represents a particular skill; colors designate communities that coalesce around skills. Try clicking "Java", for example, and notice how many other skills accompany it (a high-degree node, as graph theory would call it). As a popular skill, it appears to be present in many communities: Big Data, Oracle Database, System Administration, Automation/Testing, and (of course) Web and Software Development. You may or may not agree with some relationships, but keep in mind, it was all generated in an automatic way by computer code, untouched by a human. Building it started with Gephi, an open-source network analysis and visualization software package, by importing a pair-wise comma-separated list of skills and their similarity scores (as Simon describes in his article) and running a number of analyses: Force Atlas layout to draw a force-directed graph, Avg. Path Length to calculate the Betweenness Centrality that determines the size of a node, and finally Modularity to detect communities of skills (again, color-coded in the visualization). The graph was then exported as an XML graph file (GEXF) and converted to JSON format with two sets of elements: Nodes and Links. "We would love to hear your feedback and questions," Simon says.

1 of 65 comments (clear)

  1. Buzzword association by tomhath · · Score: 3, Interesting
    I had to follow the link to Hughes' report to find how he created the list of inputs:

    by importing a pair-wise comma-separated list of skills and their similarity scores ...

    we’re generating that automatically from job descriptions posted on our site.

    So what this really shows is how often the same two buzzwords appear together in a job description posted on Dice.

    I found another comment in his report interesting:

    We also tried using the resume dataset, but the results were of a lower quality,

    I assume by "lower quality" he really means "people list every buzzword they can think of on the resumes posted on Dice".

    Given the inputs I wouldn't expect any surprises in the results. But that said, it's an interesting project and they did a very nice job with the visualization.