SpinVox "Recognition" Is Often Expensive Human Transcription
An anonymous reader writes "SpinVox offers to convert voice messages to text using a system called D2 or 'the Brain.' According to BBC News, said 'Brain' is often of the old-fashioned kind: SpinVox is sending private voice messages to South Africa, the Philippines, and maybe Egypt to be typed by people in a call centre, despite being registered as keeping all private data inside Europe and claiming that the text is somehow anonymised. Insiders say they transcribed 'love messages, secret messages' and everything else from beginning to end, and the company is being bled dry by the cost: SpinVox has been locked out of one of their data centers over a payment dispute. SpinVox refuses to comment further on details — but according to their web page, they're 'enabling the Speech 3.0, Voice 3.0, and Business 3.0 markets,' whatever that means."
Best algorithm, ever.
It may be they lied about keeping user supplied data in house, and they may have implied that they used advanced technological means to do the transcription, but if their service does what it says I can't blame them for using human labour to do the transcription. Human brains remain the only high performance computer manufactured with unskilled labour.
Scientists point out problems, engineers fix them
altslashdot.org: The future of slashdot.
Now with 20% more vowels!
We're not even done with Bubble 2.0 yet!
Seriously. If their target market is English speakers and the people doing the translating don't speak English as their primary language... dude. Seriously. Nevermind the privacy issues here...
That's awful.
By the way I'm releasing a new text-to-speech service; the algorithm makes for a very smooth speech. It does however have a little bit of an accent.
They just need more of that.
From their PDF:
Speech 3.0: Fully-hosted, commercial strength SLAs, proven scale and reliability - no CapEx. Scales on demand to 150m capacity
So Speech 3.0 provides 150 meters of service-level agreements with no experience-point cap.
Voice 3.0: Superior and proven range of voice products. We repeatedly deliver great, mass-market experiences with our expertise in marketing and management of all lifecycle stages.
Voice 3.0 takes you from larva, through pupa, all the way to butterfly, and then you die and get eaten.
Business 3.0: Mature yet flexible business models - designed to adapt to the dynamics of service brands we partner with, from on-demand to full lifecycle revenue strategies
Business 3.0 is apparently a flexible business model where they interact with their partners. So that's new I guess, no one has thought of that yet. It's also where people who write marketing buzzwords go to die.
"Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
The speeck recognition people have broken their promises for several decades now. Using humans is still the only working speaker-independent way to do it.
What I find surprising is that it is apparently not cost effective. Here is an alternate approach: Have people transcribe it, but let them look at "pictures" as reward. Seems to be working well in breaking catchpas, so why not for this?
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Spinvox has a denial here, claiming this is a case of disgruntled employees spreading falsehoods.
Of course one'd expect them to deny it, but they've just upped the stakes. They would be in violation of UK privacy laws *and* lying through their teeth if this denial is false.
Go somewhere random
That's nothing, I just bought an application that converts my speech to text. Read that back to me. I said, read that back to me. God damn it, what the hell is wrong with this thing. Stupid blinking light, what the hell is that supposed to mean? This is... oh here we go. No, don't send
<Complete your profile by adding a signature!>
They could go a step further, using the strategy used to crack captchas, putting humans to "solve" the problem of telling what is being said in a sound file to be able to access the next part of a porn image or another kind of non economical incentive. Don't have to be the full message, just parts between pauses or things like that
Human transcription performed on industrial scale by non-native speakers is nothing new. For example, medical imaging texts are typed up by Cheap Foreign Labour from voice messages recorded by doctors. ;)
So remember this next time you read the analysis of your expensive MRI test.
every problem looks like a nail.
When all you have is six billion, renewable fueled, autonomous, self replicating, self housing, self programing, hundred billion node neural networks...
who the fuck needs an AI for voice recognition?
Losing your job to Bender: technological progress.
Losing your job to Apu: outrage.
But really, what's the difference? A service is a service. It's all progress .. sort of.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
South Africa and the Philippines have large english speaking populations.
If I can't understand a Geordie, let alone a god damn American, how the fuck will a computer, I doubt the Africans/Asians (who despite above claims probably speak the queens English a damn sight better than most of you guys (assuming slashdot is populated by gorram Americans)) will get it spot on, but their internal algorithms have had a data set of at least 18 years to train on, this beats any automated system!. Voice recognition* has its places (e.g the iPhone does it right), but transcription is not one of them, if humans work best (and I'm pretty fucking sure they will), just use humans and perhaps use automated cleanup on the input (remove names) and the output (use grammar checking).
*s/Voice recognition/Any natural language input/g
IranAir Flight 655 never forget!
they dont even need to have speech recognition, they just need to recognize when a few word is spoken and have people listen to individual words.
Meet the Mechanical Turk.
One "Aw, Shit!" is worth 100 "Ata boys!"
I have been using spinvox on my phone for almost two years. It works great, I don't ever get phone messages that have private / sensitive info. Even if they come out and say that they've been using people all this time, I'd still want to continue using the service. It's been great rarely having to listen to voice messages. In the past, my messages would build up for weeks to the point that my mailbox would be full and then I'd go through and delete them. If I had a missed call I'd call back and not listen to the message first. Now, I get an email almost immediately and can conveniently read the message. Maybe I'm just super lazy, but I like the service privacy issues be damned.
The real problem is that people have lost their heads in the United States. The return of evangelicals has led to an atmosphere that is literally opposed to science. So, you get exactly what you expect. Opinions that are based on anecdote and wish thinking instead of data. The reason science works is because you start with the assumption that you don't know something until you can prove that you probably know it, with repeatable, verifiable results. When you start trusting the word of pill junkies and homophobic college dropouts versus the entire scientific community and their reams of data, get ready for some wide-reaching and catastrophic fuckups.
Canada kept the rules. The Canadian banking system is still the most sound. Every time we take cops off the financial beat, we end up with a banking crisis. These realities can be arrived at by simply reading about the last 30 years of panics, and the hundred years of bank panics that existed before the FDIC and sensible Great Depression legislation.
But leave it to the same fuckers from Harvard, who apparently can't even manage a college trust without running it into the ground.
The pro-market propaganda will continue, and probably destroy our economy beyond repair. And then some wise ass will say that it shows that the market does work, by wiping itself out.
This is an unprecedented marketing opportunity and I think it is working!
Winkey shortcut mapping for 64bit windows. WinKeyPlus
This has parallels with the main premise of Sleep Dealer http://www.imdb.com/title/tt0804529/
A theme in the film is Virtual Labor - robots of the future will really be remotely operated by cheap overseas labor. SpinVox is doing similar kind of things, but unlike Mechanical Turk has the factore of outsourcing to the low-wage regions.
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
What about Google Voice transciption? It seems to do such a good job I always suspected it was Google's private version of Amazon Mechanical Turk.
Kriston
No way, a service that was not what it seemed, nor as secure? How could this happen in Europe of all places?
6.8SPC TR of 550, l xwind at 6, drift rt at 26" drops 77". AT has 503 ft-lbs at 1403 fps. FT 0.86
I've been wondering about "image ATMs", which accept checks for deposit, imaging them. I've had one correctly accept a check with the amount handwritten in cursive. I suspect that at least the hard cases are being referred to humans for recognition.
Vista file copy is O(w^2) where w is the amount of time a normal person is willing to wait for a file to copy.
Even after decades of Moore's Law advances, computers still can't even convert speech to text with any kind of reliability. Artificial intelligence capable of passing the Turing test? Please.
Speech recognition is just so trivial a thing compared to human-level AI, it's like the difference between learning to walk and flying to the moon. And we still can't even walk properly without falling down all over the place yet.
I can guarantee real AI will never be achieved in my lifetime. And quite possibly not in anybody's liftetime.
There was an official response to those accusations : http://blog.spinvox.com/ It's quite interesting.
You guys do know that many, many South African's speak English as their first language, right ?
Most South Africans I know speak better English with fewer weird accents than the UK population. I've never had to try and understand what is meant by 'Arwight mate, innit' while talking to a saffa :)
Canada kept the rules. The Canadian banking system is still the most sound.
Think about what does a "sound banking system" actually means. It means that old money stays that way. It means that generation after generation, the same banks gain more and more power and get to call more and more of the agenda. Stable banking systems are good for people who are already wealthy and powerful. Wiping out unwisely invested wealth punishes the greedy and gives the have-nots a new opportunity.
But leave it to the same fuckers from Harvard, who apparently can't even manage a college trust [vanityfair.com] without running it into the ground.
They had a spectacular run for a decade and now they are making room for some other university to take the top spot. I think that's good. Why should Harvard remain the wealthiest and most powerful university in perpetuity? What would be so good about a system in which, once you accumulate wealth, you, your family, or your organization just keeps it forever?
Change and shaking things up are good. We need financial crises and recessions if we don't want stagnate or accumulate a de-facto nobility.
What society can do is make sure that nobody starves when everything comes crashing down, and we have mostly done that in the US.
I totally called this, back in 2007, when LiveJournal started to use SpinVox's services.
I was suspicious at the time, and started to look for information. What I found made me absolutely sure that at least part of it wasn't actually as automated as it was made out to be, and in fact, gave me the distinct feeling that it was mostly manually done by humans.
I started to write an article on the subject that I was going to publish in the LJ community "no_lj_ads". Being a Support volunteer, I had access to the feature before it was released for general use, and I was able to make some observations. However, although I made good progress on the article, it was never finished. There were lots of points to make, and it wasn't long after that that LiveJournal was the subject of a controversy known as "Strikethrough". The article got buried on my computer and forgotten about, half-finished.
In 2008, I dug up the article again, completed it using notes that I had left, and reposted it to my LiveJournal. I'll reproduce it here, too, because I think people will be interested.
Remember, this article was originally made in 2007. Because of that, some of the links are now defunct. The article has been slightly edited in places in order to note where this is the case; these edits will be noted [2008: Like this!] or [2009: Like this!], depending on whether I noticed it in my 2008 reposting, or in this 2009 reposting.
On to the article!
My brother uses Spinvox and I have always taken great pleasure in leaving message designed to mess with the speech rec. Things like adding in random words in the middle, leaving a message consisting solely of the car registration plates I could read walking down the street... that sort of thing. Even funnier to think some poor sole probably had to transcribe these things.... :)
That said, I would assume that the transcriptions are probably going towards the data for the language model. If you don't collect relevant data you can't do the recognition. Presumably the plan is to reduce the proportion of transcribed messages over time. Of course, they should be splitting the messages up into sections for transcription so no single transcriber gets the whole of a message. As long as you split them at silent points that should be OK.
I've used SpinVox for years at it was pretty obvious to me that it was a person doing the work - it was far too good. Friends went through a stage of sending increasingly bizarre messages to see if I would get something sensible and I generally did.
"...they're 'enabling the Speech 3.0, Voice 3.0, and Business 3.0 markets,' whatever that means."
So is that like 5G or something?
Bah, marketing buzz words...
Say what you will about that movie, but one scene struck me: Borat sleeps on the ground outside one of those born-again megachurches, and a crowd arrives for a Sunday service. Instead of trying to help him, churchgoers just step over Borat and try to ignore him on their way to the service. If that isn't the antithesis of every teaching of Jesus in the New Testament, I don't know what is.
A useful English speech recognition software 'Wave To Text v5.2'. Help you convert your voice to text in real-time, while the program's wizard enables you to convert your Windows Audio WAV files (speech recorded) offline. http://www.111download.com/product/wave-to-text-v.html
they're 'enabling the Speech 3.0, Voice 3.0, and Business 3.0 markets,' whatever that means."
It obviously means they're outsourcing all the work to 3.0rd world countries.
Never attribute to malice what can be adequately explained by ignorance or stupidity. -Isaac Asimov