Open Source Speech Recognition - With Source
Paul Lamere writes " This story
on ZD-Net and this recent story
on Slashdot
describes the recent open sourcing of IBM's voice
recognition software. This release, unfortunately, doesn't include
any source for the actual speech recognition engine. Olaf Schmidt, a
developer on the KDE Accessibility Project ,
is quoted as saying 'There is no speech-recognition system available
for Linux, which is a big gap.' In an attempt to close this gap, we
have just released Sphinx-4,
a state-of-the-art, speaker-independent, continuous
speech recognition system written entirely in the Java programming
language. It was created by researchers and engineers from Sun, CMU,
MERL, HP, MIT and UCSC. Despite (or because of) being written in the
Java programming language, Sphinx-4 performs as well as similar
systems written in C. Here are the release notes and
some performance data."
Ate lurks barry wall.
Quick someone port this to C.
"Open Source Speech Recognition - With Source"
"This release, unfortunately, doesn't include any source for the actual speech recognition engine."
These guys have built a java based middleware portion of their application suite that handles speech regognition and t2s: www.voiceobjects.com Seems like the VCs have this week's "big thing" keyed up.
-- http://www.criticalassets.com
Say it out loud and imagine bad speech recognition.
Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C.
Im sick of these comments. Anyone that needs to know about the performance of Java knows its very fast. Why bother commenting about it anymore?
Its like saying "... and because it was written in C, its very fast...", as if we didn't know already.
When are we going to get GOOD text to speech, that uses modeled parameters of human vocal tracts rather than stitching together a bunch of pre-recorded phonemes?
Yeah i'm sure it's just that easy... you dumb f*ck
In OS/2. Really, it was just about a decade ago. It worked pretty well, especially when you take into account the computer power of the time.
Old and busted = voice recognition
New hotness = word spotting
When are we going to see software for Linux that allow us to search keywords in audio or video files like Dragon MediaIndexer does?
Colloquially known as "pointer-envy", this condition may affect all programmers, but is especially prevalent in java and C# developers. It is most easily recognized in a release announcement, where for no reason whatsoever the afflicted developer suddenly interjects a statement like "and it's just as fast as C", to the bewilderment of the audience.
Treat suspected cases with caution, and under no condition contradict the patient. There is no known cure.
:wq
Kind of interesting side note I have a professor who worked for IBM about 30+ years ago where he designed circuits to filter out frequencies above the third or fourth harmonic. It is somewhat difficult because the voice has no repeating patterns (like a complex sinusoidal wave) within a given word. So the engineers had to decide when it is appropriate to filter out the higher frequencies and still have the voice sounce clear. The found that anything above the fifth harmonic didn't make a big difference so for most cases they used the third or fourth harmonic. Do you want a free Sony 27" flatscreen TV or maybe a 17" flatscreen monitor? From those who brought you free Ipods comes Free Flatscreens. http://www.FreeFlatScreens.com/default.aspx?refere r=9534369
Free Ipods it's for real check out Wired then go to: http://www.freeiPods.com/default.aspx?referer=8533
--
Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.
Now my linux box can wreck a nice beach!
#include "humorous_pop_culture_reference.h"
From dept-of-redundancy-department?
I'm not one to be picky about titles, but sheesh...
Eef ya uses thees one ya'lls gonna hafta talks like dis heers.
Given those build instructions, you are better off writing your own engine. This is exactly what is wrong with Linux today, and I dont see *any* solution to it. A maze of hidden dependencies and incompatabilities. No thanks.
Title: I'm(Aim) using(You Sing) it(Ate) right(Write) now(How)
Body: It(Ate) works(lurks) very(barry) well(wall).
Your CPU is not doing anything else, at least do something.
Personally, I find the genre interesting. However, it's useless as it's implemented in the Java proprietary language and if I understand correctly (didn't bother to look thoroughly), depending on closed-source libraries. Therefore it is no real free software. Have you noticed the .exe and .jar files in the CVS?
So how long before this is integrated with Asterix for voice activated linux telephone apps?
Michael
reporter, stop replying to your replys as AC. get a back bone.
It looks like Sphinx-4 works on most (any?) platform that supports Java 1.1.4. So it's not just for Linux, but also MacOS, Windows, Solaris, etc.
"There is no speech-recognition system available for Linux, which is a big gap."
Um, Sphinx 2 (a predecessor of Sphinx 4) has been around for quite some time now. Like Sphinx 4, it's speaker-independent. Unlike Sphinx 4, it's a C library, and is thus easily interfaced with other languages (insert shameless plug for a simple Python interface for Sphinx 2 I wrote).
Speech recognition is one of the worst means of input there is for a computer. Keyboards work so much better. Even for those who don't have full use of their hands, there are many other options for user input, all of which are better than speech recognition. Worst thing ever is someone trying to use speech input in a cubicle environment.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Woman: [dictating into cell phone] To: Mike. I had fun last night.
Cell Phone: To: Mike. I have lip fungus.
Woman: [into cell phone, angrily] I had FUN, not lip fungus!
Cell Phone: I have fungus, not lip fungus.
Woman: I DON'T HAVE LIP FUNGUS!!!
ahhh, the troll has finally returned. and nice to see you puting some hate on Sun as usual. just like old times.
Maybe you like the Cambridge HTK better, then ;-)
--
Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.
I've used a few packages for speech recognition but none really got me too excited. Well, Dragon Naturally Speaking did have me read a few chapters of Dave Berry to it. I bet it didn't work because of all the laughing, I was in tears.
I must say though that speech recognition is something that the whole computer community needs to work on. Now, we can finally do that. All the "open source community" needs is source that works a little. In a year or so, I bet this works better then most options available today.
Now, I know that isn't the rule but this is the type of thing that computer/math engineers could sit down to and contribute where others can't. It seems to be the rule that the really smart ones tend to work with open source software...
Really the cool thing is that this could get people involved who otherwise wouldn't because they don't know where to start.
Get your Unix fortune now!
I have already eaten lunch
It's called humor jackass...
Think really hard about this one...
Mississippi == Southerners
Southerners == Southern Accent
-wow, I'm not being a troll
Does this incarnation of Sphinx or any of the other open-source speech recognition systems allow us to process acoustic scores and potential phonemes while the user is still talking?
In other words, can we access a time stream of phone-probabilities as it is being updated?
Thanks!
AC
Hey moron, it's R2D2 that beep-booped. C3PO was fluent in over 6 million forms of communication. ;-)
Ba-dum-dum ding!
Computers are useless. They can only give you answers.
-- Pablo Picasso
You know, when I was growing up, we had to do all this hand-writing practice and public-speaking stuff, and we had to write letters and use the phone for everything. Then came PCs and the internet and everything was done using computer forms, spreadsheets and email. So naturally, I forgot all my skills in handwriting and speaking in favor of typing and clicking.
Now, years later, I'm given Palmpilots that I must scribble on and PCs that I must talk to, and everything is in multimedia streaming audio and video, including emails... Would you people make up your minds already?! Do I learn to type and fill out forms, or do I learn to write, speak and listen?? Please, please, please pick one mode or the other and stick with it... Sheesh!
"Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C"
It's amazing that the myth of Java being slow is so persistant. In fact, for computational tasks, many benchmarks have shown that a modern optimized JVM with JIT compilation is roughly equivalent with most implementations of C++, with some benchmarks being better for Java and some being better for C++.
Java *used* to be slow, in the days before optimized JIT JVMs. IMHO, another reason the myth persists is because Swing *is* slower than most UI toolkits in many cases, and it's easy to associate GUI slowness with overall slowness.
In my own case, for ease of cross-platform operation, I've ported several computationally intensive image processing programs from C to Java and have experienced a speed degradation of perhaps 10-15%. The Swing GUI, of course, feels more than 10-15% slower.
Resist! It's SUN trying to ruin Linux and OS again!
It even uses Java!!!! Slashbots must fight back!
"This data was collected on a dual CPU UltraSPARC(R)-III running at 1015 MHz with 2G of memory."
Looking at the performance data it just blazes along on that config. Not exactly what I'd call an embedable system, though Microsoft might beg to differ.
Government of the people, by corporate executives, for corporate profits.
What is the problem? Speech recognition is a mature technology, and algorithms for speech recognition are well documented in the research journals. The federal government has long since stopped funding research into speech recognition.
And by mature you mean of course immature. Speech recognition is at the Model-T stage.
Once you get speaker-independent recognition with the same accuracy as humans for a price that's cheaper than hiring a secretary you can claim maturity.
Well struck Sir! I salute your Trolling mastery.
The speaker independant feature is the best part. Not all words were recongnized, about 70%. Probably because I slur the other 30%. It works equally well with either my wife or myself issuing commands.
70% is more than I need for this particular project, but I'm sure this new release closes the gap even further.
I am billdar, and I approve this message.
Guess I won't be listening to music when root anymore. In fact I am sound proofing my room to keep the noises from infiltrating my microphone and causing me to accidently delete /home
How the F*** is this an interesting comment...funny or flamebait I'd believe...insightful isn't even much of stretch. Oh how much better this site would be if the mods didn't have so many chemcal dependencies.
What is the problem? Speech recognition is a mature technology, and algorithms for speech recognition are well documented in the research journals. ... If you want to code some speech recognition software for Linux, just get a good book on C# ... and photocopy some relevant papers from the IEEE Transaction journals.
Let's apply your logic on why there isn't a good OS implementation of Java: What's the problem? Java is mature technology and its API is well documented. If you want to code some JRE or Java compiler for Linux, just get a good book on C# and download the API off of Sun's website.
You're reasoning is all wrong. Plus I'm still scratching my head on why you recommended C# over Java on LINUX.
Little Bricklets
Sorry for airing reruns but its more relevant here. As I have said before:
Re NSF blowing a measly million to put speech recognition in silicon [for which there were many interesting and informative comments posted] I said:
Just a million? Pfft! I went down the tubes with one S.R. startup back in '92 that ate far more of some VC's money than that. Now NSF is not in it to get rich and I hope I am right in assuming that a successful chip design, if a mere $1000000 gets that far, would then be available at no fee to any foundry, or at least US foundry. OK, any foundry that wants to sell S.R. chips to the DOD. This lines up pretty well with IBM's recent give-away of its S.R. code: it is an admission that Speech Recognition is a commodity and nobody knows how to make any money with it so govt must fund further development.
BTW, automated recognition of music [as in "what is this tune I keep humming?"] has been on the drawing board at Philips over in the Netherlands for over a year. Philips isn't saying much. But it appears you have to have a pretty accurate sample to get recognition since they want to arrest your piracy based on this recognition...no S.R. software worth its $1000000 is that fussy about sound quality.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
I could easily live with 10-15% slower, IF Java didn't have the startup overhead. I can run inetd-style fork-exec-terminate servers in C on CPUs that a cellphone would spit on, and handle hundreds of connections a second. Bringing up a JVM on the same processor would take minutes. Bringing up a JIT runtime would be out of the question.
For applications where you can create a JVM and use it as you need it, Java's great. Webservers, sure, no problem. Desktop applications, heck, the GUI overhead's getting to be the same order of magnitude (though that HAS to change, we can't afford to depend on Moore's Law much longer unles someone comes up with a clever way to cut the power consumption of processors faster than the speed increases). Browser plugins? For content, yes, but not for navigation... if it takes 10s to start up a JVM your customer's already hit "back".
English, that is. Homophones make the language a mess. Spanish might be better. Lojban would be quite cool, as that seems to be going along nice. All our languages will seem silly post-singularity.
-I am an elective eunuch.
That's what I want, not SR. I tryed using the voice recorder feature on my PDA but it's not something I can use without a secretary to transcribe my voice into text. It's bad enough taking or leaving voice mail... it's just not my medium.
But if I could take those wave files from my PDA and convert them to text notes... even in the background offline after I sync, then they'd be useful. But you need accurate transcription for that. Is that in there?
needs an introduction to Jython.
Speaking of speech recognition, I recently called Telus and was surprised that I was directed to a service by a voice recognition system rather than the monotonous number inputs. As I've heard, Telus spent a lot of money on the system, and they don't want to scrap the project. Perhaps they can make use of IBM's technology.
Open Source Speech Recognition - With Source
Does it come "with au jus sauce" ?
Would that make it "with with source source" ?
"The grandparent post is intended for someone who actually understands digital signal processing and Fourier transforms."
hahahahaha. this is funny. You think you know anything about speach recognition because you know how to use Fourier transforms in DSP? Dear god, get a life. Speach regognition is EXTREMELY complicated. This is why no free programs for it exist.
You are, of course, perfectly correct in everything you said.
There are a number of HCI aspects where speech recognition is not a good solution.
However, let me enumerate a number of other ones, where it's superior:
Minutes of meetings, or similar. Imagine having a verbatim record of a discussion there by the time you get back to your desk.
Someone who cannot type - e.g. no hands. Rare, granted, but still a viable use.
Someone whose hands are busy. The cannonical example here is a pathologist doing an autopsy, where they dictate everything. Speech recogition saves time in transcription (and money for the audio typist).
I'd love to be able to issue voice commands to a computer, for a few, isolated cases. For example, diagnosing hardware. Bring up a doc, and be able to get the computer to flip pages, without having to remove the probes from the hardware. Re locating them is a pain, and sucks time.
Moreover, I'm certain that there are others, some of which will only be realised when it's common and cheap enough to be widely available.
It's like a mouse. It's one of the worst general purpose input devices for a computer [0], but it's excels at indicating a single element on a display. The mouse and keyboard complement each other, and there are a bunch of other, more specifc input devices, such as the graphics tablet. I have no doubt that if speech recognition was as accurate and reliable as a graphics tablet, it would get a similar amount of use.
[0] Try inputing a block of prose with only a mouse. Even specilist software makes it only suck marginally less.
I love x10 :-)
Speech recognition is not really a solved problem. For some applications it works adequately, but if you take a look at the error rates for the Sphinx system to which the post links, you'll see that the Word Error Rate for large vocabulary is over 18%. Even for 5,000 words it is 7%. For many applications that is unacceptable.
A second factor is that these statistical speech recognition systems require extensive data for their language model. Building such a system requires recording real speech, segmenting it and creating a set of examples from which to compute the probabilities, which requires some knowledge of acoustic phonetics, and doing the computation for the model. This is time-consuming.
Speech recognition technology isn't a dark secret, but it isn't trivial to create a system with good performance either.
Once the algorithm is determined, the coding part is easy.
For the OS, the theory is simple, but the coding is very hard.
Stop acting like a Korean bigot.
The very small vocabulary needed for desktop control makes the speech recognition much more accurate and usable.
Speech recognition seems similar to VRML. It would be really cool if it worked. But it never quite seems to work.
Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
With that said, you can probably guess I have a lot to say about Speech Recognition. (Not Voice Recognition, that's different, that would be able to distinguish Ben from Charlie for example.)
A good SR engine is, of course, essential. And I've not read the details on the two recent giveaways, but I suspect that they are only the engine.
The SR engine is just a begining. There is a ton of UI work that needs to be done. Sit and think about spacing around punctuation marks and then think about capitalization around puncuation marks. Yeah, it is all pretty cut and dried and known but the details really need to be sweated to get it right. This is very time consuming.
Next you have to worry about exactly where you are editing. Is that into Microsoft Word (or Open Office), or emacs, or where? It can make a huge difference when you want to go back and correct misrecognitions. You just don't want to send N delete characters and retype it, that results in a lousy user experience. So just exactly where is the input cursor at all times? This is not an impossible problem, but one where the details must be sweated.
Next is command and control. Just how are you going to let the user grab the text of all the menus and all the text in the dialog box buttons. Again, not impossble, but more of those pesky details.
Finally, is your SR engine good enough? Maybe, maybe not. Let just say that 98% accuracy might look good on paper, but that is one in 50 words wrong. Unless your correction mechanism is smooth, an error rate that high greatly slow you down.
Is Open Source SR a good thing? Oh yes sir, yes! But lets not forget the details. One thing the Open Source community has been accused of, perhaps justly, perhaps, unjustly, is not sweating the details.
Speech Recognition has an awful lot of details.
I was thinking about this the other day, and was wondering if this is a huge gap in the Windows user interaction model.
Think about how you input info using windows. You click on a few locations using the mouse, perhaps use some keyboard input, click some more. The output from these inputs is arbitrary: it may result in anything from a 'File/Save' dialog to a custom error dialog box. There is no linear path for inputting commands, or for mapping inputs to results.
Compare this to the command line. You enter a few distinct atomic commands, and view the results in the same medium. You then enter more commands, refining your actions. The key here is that you already have a linear model for input that produces well defined expected results, all in a common medium that is conceptually simple, visible to the user, and easily processed by machines. Extending this model to accept voice input or output is trivial.
How is one supposed to quantify basic tasks and turn them into equivelant voice commands without a baseline framework or paradigm to extend from? How do you automate, simplify, or extend existing tasks without a common input or output medium? GUIs provide no such medium or framework; that same framework is at the heart of the command line interface!
Perhaps this is why we never saw voice recognition technology take off on Windows. It's blinking impossible to script actions for an arbitrary task, let alone process the arbitrary results!
On a similar note we may see voice recognition on Linux take off like a rocket. Anybody can add voice recognition to perform almost any command because the actions are all scriptable throught the CLI already. If you can type it, you can get your computer to do it when you say 'computer, foo!'
Mars
P.S. It would be greatly appreciated if someone could please clarify my point. It's buried in there somewhere...
Replying to myself, because I just had one of those silly ideas, that Might Just Work (tm).
You know how the TV remote gets lost from time to time? And it's always a pain to find. Or that the remote is the other side of the room, so you have to walk away from the TV, pick it up, and end up moving further than to the TV itself?
Put a microphone on the set top box, and use voice recognition instead / as well as a remote.
Sonic contamination is easily solved, by subrtracting out of the picked up audio the TV stream.
Use a keyword, then a simple set of commands (chanel up / down, jump to channel, volume up / down, mute, maybe some menu/PVR functions).
For those that use MythTV boxes, this should be straightforward to set up, although subtracting out the final audio stream might be tricky. Might not be needed, depends on the room geometry.
That's the power of open source - no need to wait for someone else to implement it. I'm off to see if I can persuade mplayer and Sphinx to comunicate...
Yea, we will all speak BASH. Seriously, the real problem is not speech recognition, it is in the area of speech understanding. A good example from an SR book from my college days.. "Please plant some more tulips." or was it... "Please plan sum ore two lips." It is not a trivial computer problem to resolve this. In fact, I would venture to say that once you have an algorythm to resolve the above then you probably also have a "sentient" computer that can pass the Turing test. That would be pretty sweet as you will have solved many of other problems in the world.
While I've been waiting for Sphinx to mature into something useful for a long time now, the move to Java makes the whole package pretty useless to me.
:)
:P
Java is a memory hog, and it's certainly not going to be on any device I would want speech recognition on. Heck, I don't have Java installed on any of my machines, mostly because of the absolutely ridiculous footprint on disk as well as when running in ram.
And integrating Java applications into other applications is very difficult. Now, Java is good for certain things, but a speech recognition engine in Java sounds like the worst abuse possible
That and I still can't train it to recognise my slight australian accent, unlike every other bit of SR software I've used on Win32
Whether or not Sphinx-4 works, and whether or not Java is 'fast' enough to do speech recognition processing, its of no use to me.
Speech recognition has its uses but it's often overrated.
Thought-macro recognition has greater promise of revolutionizing things. Already animals are able to control devices and play games just with sensors hooked up to their brains.
Scenario: you're looking at something e.g. noticeboard or something, your SuperPDA's camera sees the image as well (in fact it's continuously recording so you can choose to permanently record stuff that has occured X seconds of buffer time ago (e.g. what happened?) ).
You then mentally mark the top left and bottom right corners of the area you want to capture, and then mentally think "Capture". All using predefined "thought macros".
The captured image is immediately processed - text content converted to text, keywords weighted by context (with tweaks done mentally) and stored in an object database for future reference/retrieval.
Same for audio - you'll have a continuous recording going on all buffered. So you can capture at anytime. That's where the speech recognition could come in. But it has to recognize different voices etc. Maybe you need to record in stereo to make it easier.
to This?
you know, the speech recognition system that worked with a whole 11 virtual neurons and could distinguish between individual speakers perfectly over the roar of a jet engine?
...I got nothing.
korean bigot??? what??? reporter, is that you? why don't you reply with your login instead of hiding behind an AC name. sheesh, what a troll...
Little Bricklets
For a fictional example of serious problems resulting from exactly this sort of speech recognition ambiguity, read the book "Sewer, Gas and Electric" by Matt Ruff.
Thats true. It is an underestimated problem. People assume that we can recognise a word by using just the sound of it. That is simply not true. When speaking at a reasonable speed humans do not utter words clearly. This is not a problem to us because we can guess the words by using context and semantics.
In order to have a good speech recognition system, the computer would have to actually understand the meaning of the sentences and put it in context. There are different levels of analysis necessary to do this. The system has to analyse sound, morphology, syntax, semantics and pragmatics. Each level contains ambiguity but when two levels are combined together some ambiguity is resolved and you get another piece of the puzzle. When everything is combined the ambiguous parts all converge towards the right meaning, the right syntax, the right morphology and in the case of speech to text, the right choice of word and spelling.
Now to do all that you ideally need sophisticated knowledge representation based on cognitive science and the way we think. Although, there exists tricks and shortcuts that can mimic the important parts of the cognitive system, there isn't any complete system that integrate everything well.
Anyways if you want a summary of the field read the textbook: "SPEECH AND LANGUAGE PROCESSING" from Daniel Jurafsky & James H. Martin
And search on google for "computational linguistics" "word grammar" "open mind common sense" "cyc" "Ray Jackendoff"
Yeah, that'll work great until someone says
"R M forward slash R F enter"
There was a UserFriendly comic about that...
Mod me down and I will become more powerful than you can possibly imagine!
P.S. There are too many states nowadays. Please eliminate three. I am not a crackpot!
A lot of a person's speech recognition ability comes from context. And we still make plenty of mistakes - and we've been recognising speech all our lives. I think the next big breackthrough in speech recognition will come with better language analysis - not seperating sounds into words but seperating them into sentences, and then into words from there.
Sure, speech recognition is far from a done deal. No it's not easy - if it was then i'm sure we'd see a lot more of it in our everyday lives. But it's quite an exciting field.
---The U.S. Navy, which listens for the sounds of submarines in the hubbub of the open seas, is another possible user.---
Possible? I'd snatch that up and slap a classified sticker on that FAST.
Quite a while back, I wrote a piece called The Talking Penguin", which I think might do what you ask. It starts ...
There is nothing inherently visual about computing. Digital processors read their instructions from files or streams of binary text. They report back to the outside world in the same telegraphic language, translated into character sets and painted onto a matrix of glowing (or light absorbing) pixels. The video graphics array, who needs it? Talking Tux needs the phonetic intonation string, then these missives could as easily be spoken.
Most developers expect to read the results of their efforts off of a screen; and the core of the system expects to issue streams of text that are displayed and not spoken. That is, the kernel is written to interface with hardware that writes, not hardware that speaks.
Interfaces on the kernel need to be handled with care. Linus Thorvald's paper in Linux World is worth reading in this context. The objection to extra interfaces on the kernel is that they are "fixed in stone". Once defined, interfaces must be preserved through all future releases of the kernel - or the new release will break existing code based on the old interface definition.
A talking kernel (strictly, a speech interface on the kernel) gives a system that can talk as it boots. This is much more useable and useful for non visual use than a system that only finds its voice once it has been successfully started and has then loaded the appropriate speech application software. Practically, this means the kernel must interface with some canonical or idealised generic speech device, the simpler the better. This means deciding how phonetic intonation strings should be written ...
this falls precisely into the Java trap. Please see RMS for details. This project while being "open source" requires a non-compatible runtime to be present.
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
... that C3PO's communications code HAD to have been open-source?
;-)
I mean, think about it. Where the heck is a slave kid on some remote desert planet going to get a library module with 6 million languages unless it's freely available for anyone to use?
> It's amazing that the myth of Java being slow is so persistant
Before you mod me down as a Troll , I work on a virtual machine as a hobby.
The problems with Java being slow have little to do with the "execution of code" part. The part that takes a hit are the Garbage Collector and the Class Loader. The latter causes a HUGE hit in the start up. The former is responsible for those strange Swing freezes I've been seeing when I switch into a Java app.
Unicode also brings its own set of junk , for example "Hello World" in dotgnu's JIT does 7302 hastable inserts, 6000+ StringBuffer operations to initialize the Unicode encoder/decoder. And that is the standard way of decoding unicode (mono uses the same code).
Lastly , C/C++ commonly uses a lot of fields while Java brings in get/set methods for these. A method calls for a get or set is a LOT more expensive than a pointer read . Design has a lot to do with why Java is slow.
The enterprise apps where Java is popular are essentially backend applications which run for long periods of time (so have all the classes looked up and loaded) with a HUGE heap (256 MB or more) where occasional GC freeze won't destroy the entire experience (as it is often JSP/Web based interfaces).
Java *is* fast, if you don't count the slow parts.
Quidquid latine dictum sit, altum videtur
"R M forward slash R F enter"
(Pedant mode)
Actually, it's "R M space minus R F space forward slash enter"
Your command would attempt to delete the "rf" file in the root directory.
(/Pedant mode)
So.. it has come to this
> Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C.
Hahahaha!! LOL!! X-DDDDDDDD
X-DDDDDDDDDDDDDDDDD x''-DDDDDDDD
Subliminal information rules.. LOL!!
Personally, I suspect that very few would rate much better than "2nd grade".
To programmers, bite the bullet and learn to program vs program lite. I know it's work, but that's what getting paid is for. This may be trouble for those of you who played Everquest for the four years in college though. Darn.
You're both nazis.
--
Godwin
Isn't there a Linux too (gcj, ???) that allows you to compile a Java application into a native standalone executable ?
There was something I read a while ago about some smart people compiling the Eclipse IDE as a native Linux application from the Java bytecode.
Wouldn't that work with this ???
110% accurate? how would that work?
"you appear to have said 'errorprone' which I do not detect as good grammar, shall I correct this to 'prone to errors'" ?
It'd be like Clippy for voice recognition.. lets just stick to getting 100% accurate please.
Alma.
w w.memoire.com/guillaume-desnoix/alma/+&hl=en
It can read several high level languages and build an internal representation and the convert that to other high level languages.
It is a great tool to help port this software to C for example.
Unfortunately the site seems to have gone, although I have used this software in the past.
See the google cache though: http://66.102.9.104/search?q=cache:Dbw7OX6Tco4J:w
blog.sam.liddicott.com
good match :)
e sis system by Sun, based on a state-of-the-art Flite/Festival system (CMU, Edinburgh)
n x4/
FreeTTS
http://freetts.sourceforge.net/
synth
Sphinx-4
http://cmusphinx.sourceforge.net/sphi
as introduced in the topic
both available in opensource license,
and both written in Java.
I think some people should open their eyes, otherwise the world will leave you behind while you are happily consoling each other how Java is slow and unusable. Wake up, folks!
To people which argument about hand writing C and assembly - well, you obviously didn't try to implement any of the algorithms (like hidden Markov models or the statistical searches) used in speech recognition. It is pain in the butt to do it even in Java, but at least you do not have the pointer mess you would have in C/C++. The engine has a good performance already, I am not sure what you would gain by rewriting it, except of bugs (the older Sphinx2 was for sure buggy as hell).
Something about the memory footprint. Java can have a large memory footprint, however with speech recognition, you will always have it. Just the accoustic models for one language can be easily in the order of several hundreds of megabytes. Memory footprint of Java is completely irrelevant here.
And before somebody compares Sphinx with speech "recognition" on you mobile phone or in your car - be aware, that you are comparing scateboard with a Concorde here. Sphinx family of engines are intended for recognition of continuous, large vocabuly speech and to be speaker independent. Your phone/car is small vocabulary, single words and speaker dependent - i.e. completely different problem. You cannot think about Sphinx as something "to have on some device". It is more intended to act as a speech recognition server on a dedicated machine e.g. for a large call center or ticket reservation system. I guess it could be used also in KDE for the KAccessibility purposes, but it is a bit heavy for that (especially with the large datasets).
So next time, before you start spouting BS about Java and applications written in it, at least check the facts. People will not see you as a complete idiot.
It is interesting to note that there is/was another CMU project that carries a similar name: WebSPINX (http://www.cs.cmu.edu/~rcm/websphinx/). It is also written in Java, but is not related to speech recognition - it's a small web crawler. Does anyone know why CMU projects like to use SPINX for their names?
Simpy
Also know as "broadcast news" is 8.6% WER at 10xRT and 11.8% WER at 1xRT. 18.7% at 4xRT is so last millenium...
OG.
Just like many on Slashdot.
:)
Oh well, I suppose it is better then nothing.
Every year the Java naysayers get more and more frustrated and more desperate to find a reason that Java just won't do. For years it was that Java was too slow... that one was true for about 18 months in 1995. Well, maybe now that we can do crypto in Java, play DOOM in Java, and do speech recognition in Java we can finally put it to rest.
Next up - Java's footprint and startup time is too slow... Take a look at what they're doing in Java 1.5 to memory map and share core classes and pre-bind read only classes. Also think about the fact that all that work the HotSpot engine does to optimize things at runtime just gets thrown away every time the VM restarts and ask - why?
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates
Or at least something that doesn't require me to say "enner" when I mean "enter". This is not a joke: ViaVoice on the iPAQ, used for navigating Outlook.
THIS is a joke.
Q: What do Americans call their dentists?
A: Dennis
WayBack has it.
I've also mirrored the source Just In Case (that's an ADSL link, you'd be better off downloading it directly from WayBack).
Got time? Spend some of it coding or testing
Now for a great idea (someone will patent it I'm sure):
Have your RSS feed reader queue up headlines or even sysnopis. Have your music player monitor the news queue and when there is enough stuff, wait for the current song to end and read the headlines as a news report. This would be easy with a nice integrated text to speech API. Get your tunes and news without interrupting your work.
Then I'd optionally want a virtual person (torso) in a small window with lip-sync to do the job. Combine speech recognition with gnome-storage and you'd have an office assistant thats worth having - so long as s/he's hot (i.e. not a paperclip).
Aim high and integrate what's already out there. It's not perfect, but it needs to be integrated with other stuff before people will really want to improve it a lot.
Speech reco *will* become pervasive in applications where it is not possible to use a standard keyboard/mouseinterface, for example telephone services or on small form factor devices such as telephones and PDAs. However no one is going to make a great deal of money on this.
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
Note in the 2002 version that the dialog server is not included, this would be great to have too. MIT also has some very cool technologies in this area - SUMMIT, TINA, GENESIS, ... - which I do not believe are public, they just show little bits and pieces of PR about them, but include natural language parsing, question answering, sentence generation, etc. It would be cool if someone on the inside could document just what things are available, what works with what, what is definitely ready for prime time, etc. There must be some people who hacked on this in the past few years and are still developing things, it would be cool if some of their experimentation was available to the open source community so people could get an idea of what things are possible. When I did my survey just about 1 year ago, Communicator was daunting, intriguing, and it looked like you could do tons of stuff if you had some secret decoder docs and a spare year to hack. Maybe now's the time to dig into it hip deep?
A lot of a person's speech recognition ability comes from context
That is also true for a reco system. Typically a Large vocabulary continuous reco system will use a tri-gram language model in order to take into account contextual information. This means that likelyhood of the candidate words is looked up based on the previous three words, which has been shown to provide a sufficient degree of contextual information to distinguish between most commonly encountered homophones.
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
A second factor is that these statistical speech recognition systems require extensive data for their language model. Building such a system requires recording real speech, segmenting it and creating a set of examples from which to compute the probabilities, which requires some knowledge of acoustic phonetics, and doing the computation for the model.
A Language model is typically built from available text sources such as newspaper text. It does not require that you have recorded speech. Generating an acoustic model, on the other hand, does require accurately transcribed, recorded speech, and lots of it.
This is time-consuming.
Yes. Very.
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
And every year the Java yaysayers use the exaggerations of being 'ready-to-run' or 'cross-platform', which become less and less true every year as more different VM versions are deployed everywhere :)
:)
Sure, Java can do X, Y and Z, but if your actually attempting to become a programmer the very FIRST question must be -should- you USE Java for X, Y, and Z?
For most cases of the above variables, the answer is no. Sure, you CAN, but you shouldn't. It's a matter of the right tool for the right job, and as somebody posted above Java is becoming the new Lisp in terms of being used for things which are just silly. Java is not the swiss-army knife of programming languages by any means
Breakfast served all day!
I built a few client apps that were deployed on a few different VM versions, though most were Win32 (1.3, 1.4, 1.4.2). I deployed to Macs without a problem.
Development was a snap, I got the whole application off the ground with relatively little problems because of the usefulness of Java's built-in API. Of course, when performance tuning I did rewrite the functionality of some of those API classes, but I'm sure you have to do that in any language.
Yes, the MS JVM is total crap, but that's what Sun got a huge settlement for. It was put in place by Microsoft in an attempt to shut down Java with a crappy install base.
Java is all about following standards. As long as you do that, your apps run pretty well.
This app that I wrote required a lot of Swing specialization and user interaction, displayed custom images, etc. It wasn't a trivial application.
So, I guess my question is, do you guys just not follow the standards? What is it that you're doing to break your apps so much?
Someone was talking about using lightweight components with heavyweight components, which I know from experience is a real beast to get working, but other than that, what is it exactly that's breaking all the time?
I'm talking about client-side apps here. I haven't used an applet in forever. Most applications on the web are jsp apps, so you are totally shielded from its Javaness.
A story about Java links to your post. I took the opportunity to respond to your post in that story. Would love to hear your comments.
Want to Know How to Cheat the GPL? Read On!
Though I responded to your thread, it was more of a question to the general populace.
If you didn't design the app, then its not your fault, is it?
The solution to your problem is to call the developer and complain. If they don't do anything about it, then your solution is to switch applications.
The reason that Java is perceived as a bad platform compared to Windows is that Sun's engineers don't go through all of the major applications written in Java and re-engineer the platform to behave as each application expected.
That is, if you write an app that inadvertantly depends on non-standard behavior in SP1, and it becomes really popular, MS will generally make it so that SP2 behaves the same way that SP1 did for your application. This is because users will largely blame MS for the app's problems and not the developer that didn't follow standards in the first place.
There was a story about this practice a while ago and how it might be changing with Longhorn.
Anyways, the point is that if you don't follow the standards Sun isn't going to save you. That might be detrimental to their image, but its not really their fault. Its users like you that don't know any better that perpetuate this misconception.
http://jist.ece.cornell.edu/index.html
A poor tradesman blames his tools.
OK - so you're a customer, then what qualifies you to identify your suppliers as "Java Priests" ? personally they sound behind the curve.
Client side executables in multi-tier apps were dead when I started work 9 years ago, and they have no place in my java religion.
In fact Swing is largely irrelevant (only 10% of java jobs require swing knowledge) in the Java world today -except- and here's the kicker for you - in the banking sector *LOL*, which seems slow to keep with other IT sectors.
For example have a look on jobserve.com which places over 80% of IT jobs in the UK (it was a profitable dot.com even before 2k) Note that a search for (java AND j2ee) AND NOT (swing) turns up 939 results vs 131 for (java and swing)
I agree that there are still plenty of horrible swing app interfaces out there (Oracle 9i installer, SunOne app-server admin app, WAS 4 app-server admin app, etc...) but the message seems to be getting through. I mean WebLogic has been leading the way since forever, and IBM's WAS 5 admin application is now web-based.
Yeah, I realized it right after I hit submit.
:)
It just didn't seem worth making a correction.
Mod me down and I will become more powerful than you can possibly imagine!