Crowd-Source Translation Software For Free Content?
yahyamf writes "I have a lot of free educational content in the form of audio lectures and text, which I'd like to translate into as many languages as possible. I would also want to transcribe the audio and create audiobooks from the text. There are already several volunteers willing to contribute, but I need some web based software to manage all the work. Facebook is already doing something like this, but it is only for their content. I've also looked at Damned Lies, which is part of the Gnome project, but it doesn't seem to handle audio. Are there any other open source translation projects out there that I can customize and build upon?"
I'm pretty sure that the writer of TFQ is looking for software to coordinate a human speech to text effort(ie, manage volunteer accounts, serve audio clips for transcription/translation, receive results files from them, and so forth), not speech to text software.
He is, in essence, looking for an audio equivalent of the interface used by the Distributed Proofreaders project. With, perhaps, a side of translation mechanisms similar to the ones used on Ubuntu launchpad or equivalent. Neither are particular exotic technologically.
Such a setup is more or less prosaic in CS terms, no major breakthroughs need to be made; but it would constitute a somewhat specialized flavor of Content Management system. I honestly don't know if anything of the sort exists.
Hello,
At transposh we aim to create such a project, that will enable crowd-sourcing websites translations (and hence your scripts), no audio is planned though.
Currently we have a wordpress plugin, but a generic plug is being written, everyone is welcomed to help
Ofer
This doesn't handle audio, nor does it seem to be up even, but this seems kind of like what you want:
http://blogoscoped.com/archive/2008-08-04-n48.html
The people over at BOINC have a software called Bossa for distributed thinking projects (crowd sourcing). I am not sure of the current status of the project, but I have heard of at least one group that is trying to implement it.
This signiture copied from somewhere.
I'm an freelance translator and I'd like to warn you about the most serious pitfall of crowdsourcing - the quality. I've seen Facebook translation onto my language (Polish) and it's terrible. There are other projects done this way and most of them are of extremely poor quality.
Problem is - if you want quality content, you need professionals do the job. They don't necessarily have to be paid professionals (translators) - maybe just the people from your field, who wish to contribute for some reason or other. But in crowdsourcing you have to take into account a lot of poor translations and you have to introduce some form of quality control - best would be to hire editors, but maybe some kind of voting system would do.
Just don't let your content be translated without QA, because you won't sell much of it.
Hold on there, cowboy. It's not that simple. In the US, work for hire status depends on three criteria, and those criteria are somewhat ambiguous as applied to university professors. Here is a more detailed discussion of the law. There isn't a clear legal precedent addressing the issue, but that's because the issue almost never comes up. The issue doesn't come up because there's a solid consensus in the world of education that the professor owns the copyright to things like lectures, textbooks, and journal articles. (Note that when it comes to articles, a journal that requires a copyright transfer asks the author, not the school, to sign it.) Regardless of the law, it's clear that there are overwhelmingly strong reasons (e.g., academic freedom) why universities know they shouldn't cross this line. It's sort of like Mia Farrow's famous remark that "you don't fuck the kids." Doesn't matter if it's theoretically legal to go there, you just don't go there.
More relevant questions to ask the OP would be (1) where we can take a look at these materials, and (2) whether he's put them under a free license such as CC-BY-SA. If the answer to #2 is no, then probably nobody will be interested in doing the translations for free.
In answer to the OP's original question, I know of two approaches that could be used. One would be to create a wiki of the English version and then allow translators to use the wiki to produce translations. Another would be to put the English version in some kind of format that's amenable to version control (e.g., plain text or latex), and use version control software such as git.
I have some experience with this because I wrote some CC-BY-SA-licensed physics textbooks, and over the years I've been contacted by roughly 10 people who were enthusiastic about translating them. None of those people ever translated any significant amount of text. It's a huge amount of work to do this kind of translation, and people's enthusiasm seems to evaporate quickly. A good example of the fragility of enthusiasm, in a slightly different context, is wikibooks, which is basically an abysmal failure, at least if you compare what it's accomplished over all the years of existence with its original stated goals, which were to revolutionize education. Writing or translating a book is just too much work for most people to tackle without some kind of financial or nonfinancial reward. It's not analogous to software, which is a functional product rather than a creative one.
Find free books.
It handles texts, not audio, but Open Source Mission's Gospel Translations might be a useful model. They work with publishers/rights-holders (if any) to get the right to post works, then coordinate translations to a huge variety of languages. Once a translation is done, they post/host it for free. The translations are developed using a Wiki. Their focus is on Christian works, but I think the approach would work for any literature you want widely distributed in a variety of languages.
- David A. Wheeler (see my Secure Programming HOWTO)
You may want to contact the folks at Librivox.org -- they're currently making audio books of the Project Gutenberg content and they have a system in place for handling the audio files, quality control --- it sounds very much like what you're looking to do. Perhaps they'd either let you use them to host projects or at least could give you pointers on how their software/processes work so that you could create something similar without completely reinventing the wheel.
Indeed. I hope you don't mean that in a pejorative sense? When TFQ is asking about translation, it's perfectly appropriate for professionals in the field to chime in with their insights and expertise.
There was an article recently in the Japan Times about a project at the University of Tokyo to build a very similar system, though it is apparently just for texts being translated into Japanese. For the curious: http://search.japantimes.co.jp/cgi-bin/ek20090422a1.html. I don't agree with some of the pronouncements in the article (understanding the nuances of the source text and accurately conveying those in a fluently written target text does indeed take some skill, whereas the article and even the project name Minna no Honyaku suggest that 'anyone can translate!'), but the project itself looks interesting. The project site is http://trans-aid.jp/ (Japanese only).
Perhaps the TFQ submitter could contact the Professor Kyo Kageura mentioned in the article to find out more about the Minna no Honyaku system? It's basically crowdsourcing for translation projects that don't merit the time, money, and quality of professional translation, which kinda sounds like what they're looking for.
Cheers,
"What in the name of Fats Waller is that?"
"A four-foot prune."
See also http://www.meedan.net/
Also, Google has a translation widget that might be a reasonable stop-gap measure.
http://translate.google.com/translate_tools?hl=en