oznoid · Slashdot Mirror

Re:Copyright on collection, not on recipes on The Open Source Cookbook? · 2002-07-24 00:12 · Score: 1

I find the GFDL to be unnecessarily restrictive for this. In fact, I find it objectionable. If i want to share recipes with my friends, I shouldn't have to abide by such an extensive social contract.

Re:Security Issues on The Problem Of Developing · 2002-02-26 23:38 · Score: 1

Actually, Perl runs on more platforms than Java... Perl is more portable than Java.

Re:I've waited too long to get the FP on George Soros Funds Open-Publishing Software · 2002-02-14 05:47 · Score: 0, Offtopic

> Information wants to be free.
So does my Johnson!

Re:To anyone who is wondering: this is a Big Deal on Linus Merges ALSA Into 2.5.4 · 2002-02-14 02:18 · Score: 1

It is a big deal. It'll make a big difference for those of us doing speech technology; the free OSS drivers were always weak on full-duplex, and to do speech in/out, we need it. Congrats to everyone involved -- which is guess is all of us.

Re:Have it talk on What if Harry Potter 5 Was an E-Book? · 2002-02-13 09:44 · Score: 1

Like translating through babelfish, english to german to english to german, it everntually converges on something really funny. We did it with CMU Sphinx and Festival, by saying something into it and then having the synthesizer say what the recognizer heard... to the recognizer. Hourse of fun.

Have it talk on What if Harry Potter 5 Was an E-Book? · 2002-02-13 09:35 · Score: 1

It'd be better if it spoke, then it could read to us.

Put it on the Device for Privacy, Security, and Ba on Text-to-Speech on a Low-Power Chip · 2001-11-07 10:09 · Score: 1

Here's some reasons why you want it on the device, and not on the server:

Privacy: I, for one, don't want to send my personal content through a portal provider. I don't want Microsoft getting all my mail, I don't want TellMe getting it either. And I don't want to have everything that I'm supposed to want available for me at the channel, with my usage stats, habits, and particulars sold to direct marketers or worse.

Security: The more places you ship the data around to, and the more intermediaries involved, the more possibilities there are for sniffing, bad security, leaks, and misuse. Passing things through a provider means trusting them to maintain security properly, and I for one don't trust many people enough to allow that.

Bandwidth: Alice in Wonderland can be shipped in full audio at several gigabytes, or shipped as a 100+kb text file and synthesized on the device. Cell connections are terrible, despite what the telcos are pushing in their media campaigns -- coverage even in the Bay Area is spotty; you lose signal as you pass in and out of cells, and there are network overloads and outages. Keeping it down to small text streams and synthesizing on the device means getting one step away from the unreliable, low-bandwidth networks available today, and 3G is a long. long way off.

Kevin

Free and Open Speech on IBM, TrollTech Integrate Linux Voice Recognition · 2001-02-01 01:53 · Score: 1

Of course, there are CMU Sphinx, FestVox, and Festival available under truly open source licenses. http://www.speech.cs.cmu.edu/

Hello, Dr. Pollack. on Ask Jordan Pollack About AI - Or Anything Else · 2000-04-06 06:16 · Score: 1

My RAAMs appear to be functional.

your student,
lenzo

Passle of robo projects at CMU on Autonomous Robot Explores Antarctica · 2000-01-31 18:25 · Score: 2

If you want to look over a bunch of robotics projects at CMU, here's a nice list. It's not complete, but there are a bunch of pictures of robots and links to more info.

Re:In The Mountains Of Madness on Autonomous Robot Explores Antarctica · 2000-01-31 18:19 · Score: 1

Maybe Nomad will turn up the Loc-Nar.

Re:Autonomous (NOT) Robots on Autonomous Robot Explores Antarctica · 2000-01-31 18:15 · Score: 1

Yes, Xavier often has his manservant with him, but it's been a while since he's had to hit the kill switch. Xavier and Amelia both come to my office and annoy me, unattended. I'm in the same corridor and my door is often open (5303).

Flo the Nursebot is an interesting new development. It uses speech recognition, synthesis, and face tracking, nominally intended as a "robotic assistant for the elderly." The 'lips' move when flo talks, and the eyes track the face of whoever it thinks it's talking to.

Robotics still has a long way to go, but things are starting to get interesting. Robots get a lot more interesting to me when you can talk to them; sometimes i wish they would just shut up when i tell them to, though.

Re:Images from Nomad on Autonomous Robot Explores Antarctica · 2000-01-31 18:00 · Score: 1

Yep, that was Dante. There's still some stuff up at NASA about it. Actually the other thread on "sending a robot to Hell" reminded me of Dante, too...

Re:Stereoscopic vision? on Autonomous Robot Explores Antarctica · 2000-01-31 17:56 · Score: 2

This is actually a pretty straightforward computer vision approach. You use two cameras, and, since you've carefully calibrated the cameras and know how far apart they are, you can compute the distortion between the two images pixel by pixel. Since the cameras are in slightly different locations (separated by a fixed baseline and angle of difference), any disparity will be the result of the different angles of view of the two cameras.

One interesting point is that the farther apart the eyes are, the more sensetive the apparatus is. So one way to get better depth perception is to put your eyes out on stalks.

Here is a paper on fast stereo vision.

Re:Java Speech? on CMU Sphinx Open Sourced · 2000-01-31 05:09 · Score: 2

That would be great. I think a little NMI work would get significant portions of Sphinx2 working with JSAPI.

Decoding from files, SGI on CMU Sphinx Open Sourced · 2000-01-31 04:44 · Score: 1

Yep, sgi_ad.c is a stub, but I just added sphinx2-test to the CVS tree, which calls sphinx2-batch to decode an example utterance. If you'd like the get sgi_ad to work :) just check out the current CVS tree and run ./autogen.sh, then ./configure, etc etc and look at sphinx2-test.

Re:How well it works: on CMU Sphinx Open Sourced · 2000-01-31 04:35 · Score: 1

You can build your own language models. Take a look at the Sphinx home page for a link to a web-based language model building tool.

Re:15 years?!?!!!? on CMU Sphinx Open Sourced · 2000-01-31 03:10 · Score: 3

The codebase has adanced considerably since Sphinx 1, and there have been a number of breakthroughs in the field since then. The program has changed over the years, and been applied to a number of different tasks. Furthermore, much of the time it's been used in whole systems, i.e., dialogue systems and natural language interfaces. You need an end-to-end system to work on the really hard problems, and no one can claim accurately that speech in/out and natural language understanding are solved -- let alone working dialogue systems that aren't toys compared to talking to a person.

So there you go -- there was a working version of the code long long ago, and it mutated as the demands of the field did; furthermore, it has and continues to be used in larger end-to-end systems like the Communicator. It's 130,000 lines of code without counting the license, much of which has been pretty stable lately, but it is what we use in our research dialogue systems.

Re:Sounds pretty unethical on CMU Sphinx Open Sourced · 2000-01-31 01:42 · Score: 1

Yeah. That was pretty unfortunate. The wording on the post got people going on the (interesting) patent discussion, but i think it takes away from what a good thing this is.

OGI CSLU Toolkit is also Open Source on CMU Sphinx Open Sourced · 2000-01-31 01:36 · Score: 1

The OGI CSLU (center for spoken language understanding) also has an open source toolkit and language resources, but their distribution mainly runs on Win32. Good stuff; they use Festival and the group there has made some excellent contributions.

Re:Public funding, but not public software. on CMU Sphinx Open Sourced · 2000-01-31 01:17 · Score: 2

The license is actually almost verbatim Apache, based on BSD. And the only reason we wanted the "you have to mention Sphinx" condition is because there was once a (nameless) system (somewhere nameless) where someone (!) took the source and just erased the authors names, and redistributed it. At least with this we can have an inclusion of the original by reference -- people can go and see the original.

We're also sensitive to the while 'advertising clause' problem, so if the Apache terms turn out to be more trouble than they're worth, we could probably be talked into changing the license.

Re:Public funding, but not public software. on CMU Sphinx Open Sourced · 2000-01-31 01:17 · Score: 1

The license is actually almost verbatim Apache, based on BSD. And the only reason we wanted the "you have to mention Sphinx" condition is because there was once a (nameless) system (somewhere nameless) where someone (!) took the source and just erased the authors names, and redistributed it. At least with this we can have an inclusion of the original by reference -- people can go and see the original.

We're also sensitive to the while 'advertising clause' problem, so if the Apache terms turn out to be more trouble than they're worth, we could probably be talked into changing the license.

RE: what sourceforge said -- sourceforge gives you a menu of licenses, and BSD was the closest.

Re:but does it work? on CMU Sphinx Open Sourced · 2000-01-31 00:46 · Score: 2

Actually this version does not require training. The acoustic trainer will be released later, and we're looking to put in speaker adaptation shortly.

About accuracy, it is fiddly about the mic volume, and distance from your mouth. Try playing with that a bit. Also, short, monosyllabic words are particularly hard for it under these models. Try speaking normally and conitinuously (you probably already were).

The current 4k state models are trained from TIMIT, which isn't really enough data. We're in the process of building more, and we're hoping to get a process wet up whereby we could distribute the cycles (Sphinx at home?).

Re:Training and Patents on CMU Sphinx Open Sourced · 2000-01-31 00:40 · Score: 5

At this point, we only have one set of broadband, 4k state models with the release. Our next step is to get a couple of sets of generic models for broadband and for telephone speech, and make a system for tailoring the generic models to specific language models.

We will also be releasing the trainer, and Sphinx 3, but it's coming out in steps. Sphinx 2 is the real-time engine, and while Sphinx 3 is more accurate, it's still slower.

As far as releasing Data, we will be releasing whatever we can. It's OK for us to release models derived from data from, for instance, the LDC (linguistic data consortium), because their licensing terms explicitly allow it, but much of our data comes from other sources. We'll be able to put some data out, but i think we'd be better off creating a public repository of contributed data, explicitly stating that all contributed data will remain free.

Re:Sounds pretty unethical on CMU Sphinx Open Sourced · 2000-01-31 00:19 · Score: 2

CMU Sphinx has no known Intellectual Property violations. This work is the result of a lot of work at CMU and involvement in publicly funded workshops. There are certainly no copyright issues (we wrote it) and we have no reason to suspect anyone has patent issues with it.

Slashdot Mirror

User: oznoid

Comments · 26