Slashdot Mirror


Amazon AI Researchers Release a Dataset of 400,000 Transliterated Names To Aid the Development of Natural-Language-Understanding Systems (amazon.com)

New submitter georgecarlyle76 writes: Amazon AI researchers have publicly released a dataset of almost 400,000 transliterated names, to aid the development of natural-language-understanding systems that can search across databases that use different scripts. They describe the dataset's creation in a paper [PDF] they're presenting at COLING, together with experiments using the dataset to train different types of machine learning models.

2 of 12 comments (clear)

  1. Pretty amazing by 110010001000 · · Score: 2

    It is really amazing all the research they are able to do there. I would have thought the humidity and rain would wreak havoc with computers. Maybe it helps there is no Internet access as well, so they aren't distracted by social media and can focus on AI research.

  2. Not the usual NN/ML hype paper by isj · · Score: 2

    The paper is informative. They point out the obvious problems (translation from scripts/orthography missing vowels, but also that many names are actually quite rare. In their dataset 73% of the names only occur once.

    They also compare the results with traditional hardcoded rules, and find that neural networks may not be better.So kudos for including non-positive results in the paper.