Siri, Cortana and Google Have Nothing On SoundHound's Speech Recognition
MojoKid writes: Your digital voice assistant app is incompetent. Yes, Siri can give you a list of Italian restaurants in the area, Cortana will happily look up the weather, and Google Now will send a text message, if you ask it to. But compared to Hound, the newest voice search app on the block, all three of the aforementioned assistants might as well be bumbling idiots trying to outwit a fast talking rocket scientist. At its core, Hound is the same type of app — you bark commands or ask questions about any number of topics and it responds intelligently. And quickly. What's different about Hound compared to Siri, Cortana, and Google Now is that it's freakishly fast and understands complex queries that would have the others hunched in the fetal position, thumb in mouth. Check out the demo. It's pretty impressive.
Or does it just stare at you stupidly because using ways to give you directions means nothing if it doesn't recognize the homophone.
Is it just my observation, or are there way too many stupid people in the world?
Sure this isn't some Baidu thing?
“He’s not deformed, he’s just drunk!”
There is a lot of hype about this. Feels a bit like a marketing push. Great demo though.
Slashvertisement!
Siri, Cortana and Google are pretty bad compared to the mobile app of "Dragon NaturallySpeaking". Nuance has been the king of voice recognition for both consumer and military use. I doubt soundhound can beat them. If they do, they are in line for some hefty contracts.
I tried sending a text with Google's voice engine last week just to try it out. It did a very good job of taking my dictation to text, then it asked if I wanted to send. I said yes. It spelled out yes in it's little window, then asked again, I said yes again, I tried other words, it also recognized those words, and every time asked me if I wanted to send, while recognizing the words. I finally reached over and hit the send button.
The preceding post was not a Slashvertisement.
Could you suck SoundHound's cock a little harder? This is the most shameless bullshit I've seen all day, and I just watched Kayne West talk for 30 seconds.
1. This demo was likely created by an engineer or sales person with SoundHound. More impressive would be a demo by a third party journalist or reviewer without a vested interest.
2. The impressive speed probably won't scale to the millions of simultaneous users Siri, Google Now, and Cortana support (assuming audio is processed in the cloud, which I admittedly don't know for sure).
3. Obviously the demo uses phrases that work. I guarantee you an ordinary person will often get "Sorry, I didn't understand the question" or whatever SoundHound's equivalent is.
4. While it sounds impressive at first blush, nobody really cares how many days it is between next Tuesday and Christmas of 2025. And that happens to be not only useless, but also pretty easy to special-case in your expert system / AI logic. So how about a demo that answers the question: "How can you make a mushroom omelette without soggy mushrooms?"
Script reading call-centre staff will be made redundant or downsized.
Banks, utilities, booking agencies, insurance sales ... all will use automated customer service, perhaps with switch through to a human operator on demand (at which point higher charges will kick in).
And brace yourself for robotic surveys and sales calls that sound uncannily like real people.
Your digital voice assistant app is incompetent. ...bumbling idiots trying to outwit a fast talking rocket scientist. ...
hunched in the fetal position, thumb in mouth.
Do you have to be such a douche about it?
systemd is Roko's Basilisk.
Slashdot why so corrupt?
This is the most impressive tech-demo I've seen in years. I don't think even the computer from star-trek was that responsive.
"you bark commands"
I'm pretty sure you don't.
I don't want to say "woof" to my phone, and i'm pretty sure even if i did Hound wouldn't know what to do with the command, since i can't actually speak dog and i'm guessing that Hound doesn't either.
This Space Intentionally Left Blank
Great, it can answer questions. That's a narrow use for a voice app.
Can it add entries to your calendar, launch apps, play music, place calls, and read your email?
That's the real question and a true test of voice recognition software.
https://www.youtube.com/watch?...
Don't just game, Dungeoneer
I've seen tech demos where you can control your TV with a few slight gestures. But in the wild you have to waive your hands like you're trying to get someones attention from across a football field. I'll believe this companies claims when they're vetted by a number of impartial reviewers/consumers who aren't being censored either via a NDA or by the company holding the video camera. Until then it about as believable as Mountain Dew glow sticks and cold fusion.
Most people have the memory of a goldfish, but I don't. last time such a bizarre voice recognition break-through was claimed, the con went like this. A company arranged for a bunch of HUMAN listeners to sit in front of a network of computer screens, and the calls made that were supposed to experience MACHINE voice to text transcribing were actually processed the old fashioned way- dictating to a person.
The company invited people to phone its special number, and receive an email with the transcription of their message, 'proving' the astonishing efficiency of their 'algorithms'. It had a MASSIVE promotion on Slashdot at the time, but Slashdot never bothered with the follow up exposing it as a con (surprise, surprise).
Such cons are about sucking in massive amounts of VENTURE CAPITAL- frequently from il--informed idiots from the Middle East with more (oil) money than sense.
Kickstarter shows how lame and hopeless ideas pushed by people with near zero skills can raise astonishing amounts of money, if the fantasy appeals to nerds enough. Here's a clue for the clueless. Google is actually the R+D arm of the NSA. In the areas where Google excels, what it does best CANNOT be beaten significantly at any given moment in time. Anyone claiming otherwise can safely be dismissed as a LIAR.
Incremental improvements, and identifying fertile grounds for future research are a different issue, of course. But if you can't smell a con like this a mile off (and with that earlier con, using Humans not computers, I guessed the con the moment I read about it here) you really lack the ability to ever apply sanity tests to the 'facts' of a given situation.
This post feels like it came straight out of someone's social media marketing team. I can see them sitting around with the "how do we get the nerds excited so tech crunch will care?" question and then you have this post...
Aside from the voice recognition, that all seemed like pretty basic stuff. Coding up the rules for dates and places doesn't seem like a particularly hard problem. The "OMG you can say 'and' and ask two different questions?!?" thing seemed especially lame.
... also, I can kill you with my brain.
Comment removed based on user account deletion
Rigged demo by marketing department works great. News at 11.
One of the problems with Apple's Siri when it launched was slow response times. When you've got to have all the voice traffic transmitted over the net to the server, processed, and results returned - it causes some lag. When you've got millions of users using the thing regularly, you introduce real challenges getting all of that data processed near instantly.
With SoundHound's improvements, I suspect people will be encouraged to speak in longer, run-on sentences, as they think while speaking about all of the conditions they want on a given search. That's going to mean even MORE data to transmit up to a server and parse before a response can be sent back.
Holy crap the video is impressive. It clearly parses phrased and dependent logical statements like " what is the population of the capitol of the country in which the space needle is located. " It alos parsed paragraph long multi-part questions. I was floored.
As for homophones, how do you (human) recognize them. Well you parse the logical context. If you are doing single word dictation homophones will always be a problem but for queries there's context. And the demo shows this thing can handle some staggering conditional contexts and long phrases. So I would guess that if your query is not ambiguous in the use of the word Waze, then this thing is approachi8ng a level where it will indeed get the right homophone.
Some drink at the fountain of knowledge. Others just gargle.
I wonder how easy it will be to translate.
There's a good chance that its effectiveness is based on a neural network that is trained in English.
Or, even more likely, a hardcoded "neural network" (because performance). Maybe they used a real neural network to be the "source code" for a "silicon compiler" of the hardcoded implementation.
I like SoundHound much better than Shazam. It works quite well (but craps out on classical).
I'm also skeptical that Apple will let it be deployed for the iPhone/iPad. Note that they don't seem to be in a hurry to release an iOS version (which would probably be fairly easy).
You've proven my point. It doesn't exist for the computer because it doesn't really understand speech.
https://youtu.be/Gqdy1jLlf50?t... is how it's pronounced by it's creators, but don't just take their word for it - try google translate and have it pronounce the two for you: https://translate.google.com/?...
It's identical. It's a problem that will occur with most "hip" app names which sound like a common word, but which are spelled differently.
Is it just my observation, or are there way too many stupid people in the world?
And it works about as well as a fiver year old trying to translate. But the video made it look freaking awesome.
Is it just my observation, or are there way too many stupid people in the world?
"Please buy us out!"
Gentlemen it cannot be understated just how morose and purile our competitors are. When gazing into the sound runes to build our auditory stage of power and wisdom to obey your every utterance, we ensure the glyphs we've created in the language our tribes wrote millennia ago are in fact purified in the basking glow. this glow, which emanates from the third eyes of our laureate engineering continuum is a holy projection of the very notion of every sound that could be or has ever been uttered from the mouths of mankind. Siri, the cumbersome blind shitlord of the tortured mac user, is no more a competitor to our brand than an idle pebble on a playground. Google itself, we have determined through our pure truth, is to sound and hounds no more distinguished than a window sucking illiterate toddler mumbling nonsense in the corner of a cut rate kindergarten in a rough side of town.
Good people go to bed earlier.
https://www.youtube.com/watch?...
https://www.youtube.com/watch?...
I got this installed yesterday tried a couple of things and it failed both times while Google got them right. It is quick but it really isn't very accurate at all.
those are silly toys.
until they can be integrated into the OS, there is no way this can be called useful.
until you can talk to it, in the middle of any app, things like "phone, select OK on that annoying dialog" or "phone, open dropbox and delete all my files"
I mean really, really, really fast. No lag at all.
Almost like it was precogniscent and knew what was going to be asked.
Use fresh mushrooms. Heat butter in a skillet until it begins to brown, or alternately heat the skillet until a drop of water takes longer to boil off (due to the Leidenfrost effect) and then add butter. Place the mushroom slices in a single layer on the skillet, taking care not to overcrowd the pan. Do not stir them. When the mushrooms start releasing liquid on their surfaces, or equivalently if you hear a subtle decrease in the volume of the frying (due to the water being boiled off), flip the mushrooms over individually with a fork. Let them brown for two or three minutes, and meanwhile add salt and pepper. Optionally, as you remove the pan from the heat, add a couple tablespoons of sherry to deglaze the pan, and toss the mushrooms briefly until the water is driven off and the mushrooms are well coated. Place the mushrooms on a paper towel to absorb any excess moisture.
Pretty much a daily thing for me. In fact, I'm a bit hungry right at the moment...
http://www.wolframalpha.com/input/?i=+what+is+the+population+of+the+capitol+of+the+country+in+which+the+space+needle+is+located
If it can't understand a Scottish accent it's a fail.
Because, you know, that is the starte-of-the-art in speech recognition and it is going to stay that way until actual AI gets discovered (no, it has not so far and it is unclear whether we will ever have it). That some tool can successfully pretend to be a bit less of a bumbling idiot is not impressive at all.
Nonetheless, the usual idiots will hail this as the coming of a new age and, if lucky, the company behind it will get a lot of undeserved profits.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I watched this some days ago (/. isn't the place to read things first anymore) and came away half impressed and half underwhelmed.
The speech recognition part is nice, and that's understating it a lot given the complexity of the topic. That for a demo they'd use examples they made sure work nicely is a goven. That it can understand fairly complex, disorganized questions is really cute. No, seriously, on this I am impressed.
But it is clearly still very far from human. It lands smack middle in the uncanny valley. It becomes incredibly clear when it talks about population numbers and lists them down to the last digit. Not only is that typical computer-ish, it's also vastly less useful than a human who would tell you "about 80 million".
When I ask my personal assistant device how long it'll take to get to city X, I'm not interested in an answer that says "3 hours, 57 minutes, 48 seconds". I want to hear "4 hours", because we humans understand it's an estimate anyways and a few minutes more or less doesn't matter anyways.
Then again, when I'm building a bomb and ask my phone for the recipe, I'd like to have exact numbers. Again, a human would understand that in this situation, "about 200 grams" is not an ok answer.
This intelligence is still missing, and it's crucial.
Assorted stuff I do sometimes: Lemuria.org
Namely the paid advertisement, and if you didn't get paid you likely just got played by someone's social marketing team.
I was surprised when I asked google now the same question in three different languages and it understood them (google now understands in which language the question was being asked most of the time).
Admittedly, those three different languages are somewhat easy to tell apart (English, spanish and japanese)
Can the rest do that?
When I saw that there was a demo, I figured it meant I would get to dictate a voice question and have SoundHound answer it.
Watch a video? That isn't a demo. If all you can do is watch a prepared video, nothing has been demonstrated at all.
You might as well say Maelzel gave a "demo" of his mechanical chess player. In a non-interactive video, you don't even know for sure it's a machine answering the question or a little man hidden in the cabinet.
"How to Do Nothing," kids activities, back in print!
Probably because https://youtu.be/5FFRoYhTJQQ
How do you feel, Spock?