Slashdot Mirror


Machine-Learning Algorithm Ranks the World's Most Notable Authors

HughPickens.com writes: Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell's approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

Riddell's algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a "public domain ranking" of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell's new ranking clearly suggests that organizations like Project Gutenberg should focus more on digitizing Woolf's work than Amisani's. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.

9 of 55 comments (clear)

  1. Of the individuals who died in 1965 by Anonymous Coward · · Score: 3, Informative

    Just to be Anglo centric I don't even see William Shakespeare as eligible on the new list.

    Maybe this should be recategorized funny things you can do with computers ?

    It's only authors who died in 1965. From the SUMMARY:

    Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world,

    1. Re:Of the individuals who died in 1965 by Crashmarik · · Score: 4, Informative

      It's only authors who died in 1965. From the SUMMARY:

      RTFA MAN

      http://publicdomainrank.org/

      Starts at authors who died in 1900. If you going to completely misunderstand the meaning of the point and nitpick on petty details at least get them right.

  2. Do not use algorithms ! by Anonymous Coward · · Score: 2, Insightful

    What a load of crap.

    This is why you get rubbish like the BBC destroying lots of "classic" early TV series (throwing the film into skips). But they made sure there was space for old episodes of Panorama most of which involved cretins of the day talking shite which is irrelevant in a few years.

    The whole point of archiving is that you literally have *no clue whatsoever* what is going to be valuable in the future.

    If you did you would be a stock market billionaire multiple times over.

    1. Re:Do not use algorithms ! by CreatureComfort · · Score: 2

      The trouble is budgets and manpower.

      If you know you don't have the resources to save everything, you have to have some way of prioritizing.

      Personally, I would rather save one or two pieces from as many different authors as possible, rather than trying to get everything of the "most important" authors.

      --
      "Unheard of means only it's undreamed of yet,
      Impossible means not yet done." ~~ Julia Ecklar
  3. Ridiculous and sad by Katatsumuri · · Score: 4, Insightful

    Of the individuals who died in 1965 and whose work will enter the public domain next January

    This says so much about our culture...

    Are there jurisdictions where one could legally and openly operate a Project Gutenberg clone with more recent works?

  4. Bad ranking by aBaldrich · · Score: 2

    I really like G.K. Chesterton, but how can he be ranked higher than Arthur Conan Doyle and Sigmund Freud?

    --
    In soviet russia the government regulates the companies.
  5. Life + 50 years almost everywhere by Katatsumuri · · Score: 5, Interesting

    I quickly checked Wikipedia, and most countries seem to stick with at least "Life + 50yr" term. That is a great achievement of the lobbyists.

    Some island nations seem to have no known copyright legislation, but they are still usually parties to some limiting international treaties, and also have similar restrictions under other names ("unauthorized copying", etc.)

    Seriously, is there no place on Earth with more reasonable terms?

    1. Re:Life + 50 years almost everywhere by tlhIngan · · Score: 2

      I quickly checked Wikipedia, and most countries seem to stick with at least "Life + 50yr" term. That is a great achievement of the lobbyists.

      Some island nations seem to have no known copyright legislation, but they are still usually parties to some limiting international treaties, and also have similar restrictions under other names ("unauthorized copying", etc.)

      Seriously, is there no place on Earth with more reasonable terms?

      You have to realize that most countries are bound by the Berne Convention w.r.t. copyrighted works. This is simply where all signatories have agreed to respect each other's copyright claims. Before that, well, an author can very well find their work pirated and indeed, one of the biggest industries in the New World Colonies was... piracy. Ben Franklin and others who owned printers realized that copyright didn't apply to them, so they promptly began making copies of everything - books, sheet music, etc.

  6. Losing Literature by Mikkeles · · Score: 2

    It may make more sense to concentrate on those lower in the list. The works of highly rated authors are likely to remain available anyway whereas those of lower rated authors are more likely to be lost.
        Admittedly, the loss may be deserved, but I am willing to bet there are some (if not many) that will be more highly appreciated in a century or so.

    --
    Great minds think alike; fools seldom differ.