Slashdot Mirror


Machine-Learning Algorithm Ranks the World's Most Notable Authors

HughPickens.com writes: Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell's approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

Riddell's algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a "public domain ranking" of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell's new ranking clearly suggests that organizations like Project Gutenberg should focus more on digitizing Woolf's work than Amisani's. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.

1 of 55 comments (clear)

  1. Life + 50 years almost everywhere by Katatsumuri · · Score: 5, Interesting

    I quickly checked Wikipedia, and most countries seem to stick with at least "Life + 50yr" term. That is a great achievement of the lobbyists.

    Some island nations seem to have no known copyright legislation, but they are still usually parties to some limiting international treaties, and also have similar restrictions under other names ("unauthorized copying", etc.)

    Seriously, is there no place on Earth with more reasonable terms?