Crowd-Source Translation Software For Free Content?
yahyamf writes "I have a lot of free educational content in the form of audio lectures and text, which I'd like to translate into as many languages as possible. I would also want to transcribe the audio and create audiobooks from the text. There are already several volunteers willing to contribute, but I need some web based software to manage all the work. Facebook is already doing something like this, but it is only for their content. I've also looked at Damned Lies, which is part of the Gnome project, but it doesn't seem to handle audio. Are there any other open source translation projects out there that I can customize and build upon?"
Are they your lectures and who owns the copyright on the lectures? Does the university or do you? Since your work product was for hire . . .
Beer is proof that God loves us and wants us to be happy.
I've fallen behind in my web 2.0 buzz words. What the hell's a crowd source? I was thinking someone or something that draws crowds like Obama or double jointed Swedish twins. Unenlightened minds want to know!
Now was that too hard?
You've got to be kidding.
Currently there is software that can do parsing of speech into text not very well. Especially if you're dealing with multiple speakers, variable quality audio, etc.
Currently there is also software that can do translation of text between languages not very well. There's a reason professional translators are still in high demand (even for just written text).
You're looking for open source software that can combine both those into something effective? If you don't mind the translated audio being practically useless, then you might be able to find something.
I hate to be the bearer of bad news, but... good luck with that.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
You can try OLPC, but they're too busy reinventing the wheel... er the GUI (Sugar) to be bothered about trivial things like "educational content"
Plone.
Hello,
At transposh we aim to create such a project, that will enable crowd-sourcing websites translations (and hence your scripts), no audio is planned though.
Currently we have a wordpress plugin, but a generic plug is being written, everyone is welcomed to help
Ofer
Here's what I think is the best way to facilitate "crowdsourced" translation: write a "semi-automatic" translator. That is, you have to spoonfeed it information about the grammatical function and meaning of all of the text, which signficantly simplifies the problem of automatically translating it. Then, you can turn over any text to crowdsourced translation. Instead of having to know two languages, all that the crowd has to know is what the text actually means, which then allows them to disambiguate it for the program.
It's relatively easy to do sanity checks too: The "clarifier" just tells it to translate into the target language and back, giving the worst (or "lowest probability") possible translation consistent with the disambiguation constraints, which tells the crowd what they need to further clarify the meaning of. Plus, you only have to do the clarification once for each text, instead of once for each target language.
I actually started developing this a few years ago, and even hired someone to develop it. (Just the interface, though, which allows you to easily mark up the text and see what you've done to it.) I had even contacted several law firms for a patent search, but strangely, all of them told me that this would would put them in a conflict of interest with a large corporate client. Too bad we haven't seen results from them...
So, who's implemented my idea already?
Information theory is life. The rest is just the KL divergence.
This doesn't handle audio, nor does it seem to be up even, but this seems kind of like what you want:
http://blogoscoped.com/archive/2008-08-04-n48.html
The people over at BOINC have a software called Bossa for distributed thinking projects (crowd sourcing). I am not sure of the current status of the project, but I have heard of at least one group that is trying to implement it.
This signiture copied from somewhere.
soo...you need software to manage work being done by a large number of people?
Any bug tracker software will do the job.(bugzilla, tracker, etc.)
Create a bunch of bugs for the things you need done and assign them to people, people can discuss them, upload solutions and discuss those solutions, upload patches for issues, post new bugs for new required translations etc.
No need to create new software for something this simple and generic.
- Jesse McNelis
...and that is all I have to say about that.
http://jessta.id.au
We've built an online tool (vSync) that does exactly what you need. It's been used by the Stanford University (see http://ecorner.stanford.edu/authorMaterialInfo.html?mid=1532) and Cisco Institute.
Feel free to contact me directly at ogi (at) tunezee.com for more details.
consider using http://99translations.com/ - they have a good interface, several OSS efforts use them for internationalization and I'm pretty sure they have a "free" option. YMMV
gigantino.tv - Heavy but weighs nothing.
I'm an freelance translator and I'd like to warn you about the most serious pitfall of crowdsourcing - the quality. I've seen Facebook translation onto my language (Polish) and it's terrible. There are other projects done this way and most of them are of extremely poor quality.
Problem is - if you want quality content, you need professionals do the job. They don't necessarily have to be paid professionals (translators) - maybe just the people from your field, who wish to contribute for some reason or other. But in crowdsourcing you have to take into account a lot of poor translations and you have to introduce some form of quality control - best would be to hire editors, but maybe some kind of voting system would do.
Just don't let your content be translated without QA, because you won't sell much of it.
"I have a lot of free educational content in the form of audio lectures and text"
Are you by any chance an Amway salesman trying to get attention?
It handles texts, not audio, but Open Source Mission's Gospel Translations might be a useful model. They work with publishers/rights-holders (if any) to get the right to post works, then coordinate translations to a huge variety of languages. Once a translation is done, they post/host it for free. The translations are developed using a Wiki. Their focus is on Christian works, but I think the approach would work for any literature you want widely distributed in a variety of languages.
- David A. Wheeler (see my Secure Programming HOWTO)
You may want to contact the folks at Librivox.org -- they're currently making audio books of the Project Gutenberg content and they have a system in place for handling the audio files, quality control --- it sounds very much like what you're looking to do. Perhaps they'd either let you use them to host projects or at least could give you pointers on how their software/processes work so that you could create something similar without completely reinventing the wheel.
Indeed. I hope you don't mean that in a pejorative sense? When TFQ is asking about translation, it's perfectly appropriate for professionals in the field to chime in with their insights and expertise.
There was an article recently in the Japan Times about a project at the University of Tokyo to build a very similar system, though it is apparently just for texts being translated into Japanese. For the curious: http://search.japantimes.co.jp/cgi-bin/ek20090422a1.html. I don't agree with some of the pronouncements in the article (understanding the nuances of the source text and accurately conveying those in a fluently written target text does indeed take some skill, whereas the article and even the project name Minna no Honyaku suggest that 'anyone can translate!'), but the project itself looks interesting. The project site is http://trans-aid.jp/ (Japanese only).
Perhaps the TFQ submitter could contact the Professor Kyo Kageura mentioned in the article to find out more about the Minna no Honyaku system? It's basically crowdsourcing for translation projects that don't merit the time, money, and quality of professional translation, which kinda sounds like what they're looking for.
Cheers,
"What in the name of Fats Waller is that?"
"A four-foot prune."
One option worth looking at is Transifex. It's being very actively worked on, with the release of the next version imminent. Also, a hosted version is planned, so you eventually won't even need to maintain a server to run it on. It works with the big five FOSS VCSes, and the new version will be able to crack open tarballs as well. The Fedora Project has been using it for about a couple of years now, with great success.
www.castingwords.com
www.icanlocalize.com has some interesting offerings in the way of workflow & price, especially if you are already using a CMS like Drupal 6.
If you use Drupal for example, you can set it up so as soon as you 'publish' (or at least advance the content in the workflow-process) it is made available for professional translators to begin working on.
The pro translators have their own web-enabled interface and toolset (I think) which is similar to www.trados.com in translation memory function.
If you are a non-profit, the iCanLocalize allows your org. to use the translation interface yourself for free. Otherwise it costs .05 USD per word to use if you do it yourself. Translation in comparison is dirt cheap I think, at .07 per word. The last time I checked a few years ago, the same thing (less elegant workflow) cost .16 euro per word, in Northern Europe.
You can't be ahead of the curve, if you're stuck in a loop.
See also http://www.meedan.net/
Also, Google has a translation widget that might be a reasonable stop-gap measure.
http://translate.google.com/translate_tools?hl=en
Social Translator (http://socialtranslator.org) are already doing crowd soure translation and is becoming very popular. They are also introducing audio and API's which can hook into existing content management systems and forum systems. Check them out, they are probably exactly what your looking for.
http://socialtranslator.org is already doing this is very popular.
For speech-to-text, an obvious place to start is with the long-aged ViaVoice engine. If you can figure out how to buy it, as the page for that info is empty or broken.
You're probably looking for something like Pootle. This is used by a number of projects doing localisation including: Creative Commons, OpenOffice.org and others
It allows online translation and management of translation projects. It translates Gettext PO (for software localisation) and XLIFF (XML Localisation Interchange File Format), by using standard localisation formats it makes it easy to manage both online and offline translations. The Translate Toolkit can be used to convert various formats into PO or XLIFF for online translation.
I'm not sure exactly what you need to do in the audio, do you want to overdub or use subtitles? If you want subtitles you can use sub2po from the Translate Toolkit to convert subtitle files to Gettext PO. You'd still need to subtitle the files, then you could put those on Pootle to allow anyone to translate them into their language.
If you have documents in OpenDocument Format (ODF) then you can use the Toolkits odf2xliff converter to allow those to be translated on Pootle.
Pootle allows people to translate online or offline and they can commit their work directly to a version control system from within Pootle. This allows you to automate most of the process for you and your users.