Domain: ed.ac.uk
Stories and comments across the archive that link to ed.ac.uk.
Comments · 421
-
Re:Open Source, from the future... *wiggles finger
The voices aren't sexy yet but...http://www.cstr.ed.ac.uk/projects/festival/
-
Re:doh!
I also know that many visually impaired people use Emacs Speak (which supports Aural Style Sheets for web browsing)
Some sort of a sideline: Any idea if any software speech synths support aural CSS (emacspeak appears to be more suited for hardware devices)? All of the Linux TTS software seems to link to IBM ViaVoice which seems to be gone. I really love Festival (especially with OGI patch - the "mwm" is much clearer than Festival default sounds!) but Festival doesn't support aural CSS, just some markup called SABLE.
(I'm not blind, but reading looooong texts from display might make me one, and being an ecologically conscious wolf I elected not to print everything! =)
-
A misguided effortSpecs for the Simputer:
- 200 MHz CPU
- 32 MB RAM
- 24 MB storage
- 240x320 color display
- Price: $300
Specs for Microtel PC with 15-inch monitor:
- 800 MHz CPU
- 128 MB RAM
- 10 GB storage
- 1024x768 color display
- Price: $320
The specs of the Microtel PC are so much better, and the price so similar, that I wonder whether a desktop form factor would have been a better design choice. Obviously, the Microtel PC is not portable, but according to the article the computer would be used to "access the Internet, perform transactions, keep track of agricultural prices, and educate children". I don't think portability is a must for those functions.
The only real advantage that the Simputer provides is a built-in text-to-speech feature, but this could be added to the Microtel for free.
I'm not saying that aid agencies should be buying PCs from Wal-Mart and shipping them off to developing countries, but I do think the developers of the Simputer should have put their efforts into producing a similar desktop computer for the villages of, for example, sub-Saharan Africa. The smallest of these villages have no electricity, but many often do, as I learned from my recent experience in the Peace Corps. Thus the benefit of the Simputer's rechargeable batteries isn't really a huge advantage. And if, as the article claims, these villagers want to access the Internet, they're going to need a source of electricity for that anyway.
Even if the Simputer had hardware just as powerful as a desktop PC, there is still the problem of software. Most software today simply cannot run on a 240x320 display. All of these educational and business-transaction programs that the article talked about would have to be redesigned especially for the Simputer. On the other hand, a desktop computer with a full-size monitor opens up the entire world of existing applications. Also, by learning how to use standard desktop computers (and standard software like word processors and spreadsheets), the user is doing more than just calculating the price of his crop. He's picking up an additional skill - computer literacy - that can be applied elsewhere. That's something I tried to accomplish during my Peace Corps service.
But then, that's the real problem, isn't it? Not computer literacy but basic literacy. Providing a Simputer to developing countries is treating the symptoms of the disease, not the cause. These folks need jobs, skills, and education, not processing power. In the village where I worked as a Peace Corps volunteer, the average teacher's salary was around $75 per month. For the price of one Simputer, a village could hire someone to teach reading skills to an entire class for four months.
The Simputer may be a good idea for a few select cases, but overall I think it's a misguided effort.
Trevor
-
This site clears this topic up quite nicely
Subject says it all.
http://www.geo.ed.ac.uk/home/scotland/britain.html -
Re:why?
-
America is so new
My university was founded in 1583, and there are others in this country which were around at the time when your country was producing maps like this
:) -
Re:So what is faster than it in the TRIAD?
This page has results for hte T3E, but no STREAM TRIAD for the SP.
-
Speech != languageWhat kind of competitive advantage would speech have offered for early humans, if language did not already exist? Language consists of much more than the production of words. You also need to be able to parse sentences, to "reverse-engineer" the grammar of your parents' language before you can start producing sentences of your own. This raises the question of whether parts of the brain have evolved "for" grammar (a hypothesis supported by Noam Chomsky and argued by Steven Pinker in his excellent book The Language Instinct ), or whether existing pattern-recognition and planning mechanisms turned out to be useful for language, influencing the form and scope of all subsequent languages (suggested by Mark Steedman among others).
It's even possible that complete languages existed before humans were able to speak. American Sign Language is an example of a language with its own complete, unique grammar and morphology, which does not make use of speech. (See Pinker's book again.) Its existence supports the hypothesis that the parts of the brain responsible for language can operate independently of the parts that co-ordinate speech. In summary, there is a lot more to language than co-ordinating the muscles of the mouth and throat.
-
Ability to speakAs I recall, Mr. Hawkings had a system that would allow him to very slowly link letters together into words. If you are interested or perhaps a few people here can help you can give your friend the power of speech.
Take the Dasher interface mentioned a while ago and link it to a text to speech engine(festival) and give your friend a fairly quick and easy way to talk.
The Duck
-
Re:DIY :)In that regard...
I've written up a little perl script which fetches a slashdot story and converts it into a SABLE document - XML specific to text-to-speech synthesis applications. I use Festival for speech synthesis - with a British accent, I can have my computer read slashdot to me in the morning...
The script basically converts slash's Light mode into something more conducive to tts purposes. It also has a substitution list to help the tts engine pronounce words correctly (how do you think the computer would pronounce CmdrTaco?) Other than that, I've been constantly amazed at how well the process works.
If anyone's interested in the script, feel free to email me.
-Karl
----------- -
I agree with this post.
I wholeheartedly agree with you. Festival is a particularly good text-to-speech program. It sounds like an English butler, so you will feel wealthy and pampered while listening to it.
Another benefit of this is that it can be modified with minimal effort to give you audio versions of not just any web site, but any plain text source, whether it be email, your grocery list, your "to do" list (so you can get in the right mindset before arriving at work), yesterday's server stats.
I first learnt of text-to-speech when my uncle lost his eyes in a fishing boat accident. He's a computer enthusiast, and I was soon impressed with his neat new software. Since then, I've noticed that many of the "accessibility" mechanisms put in place for the handicapped can be beneficial to normal people as well. I frequently browse the 'web with no images, and just use ALT tags (intended for the blind). Certain city intersections that "chirp" for blind people when the light is green allow me to cross the street while reading. Wheelchair ramps are easy on the knees, and handicapped parking spaces are usually open and very close to my destination. Text-to-speech may be your first step into a larger, more convenient world. -
Re:I can just imagine
I'm sure something like that could be done with Festival, a free open source TTS app... sounds like fun! Then again, if your kernel is borked...
:-D -
Re:curiosity?Jet engines have an RPM of 30,000 or higher, plus they get birds sucked in there. I don't think platter manufacturers have really tested the limits, such as by using jet engine materials
These energy levels are used everyday, face it, it's easy to die - ever seen a biker's piston fly out of his engine at 7500rpm? I'm telling you bikes should have auto gearboxes, many still have manual and this piston will give you more than just a sore ass. Jet engines despite having to be light to minimise fuel consumption can still lose turbine blades after sucking in birds without throwing the blade out of it's casing and through the cabin, slicing the plane in half. Face it, it's just another way to die, but still with the right mediahype I think we'll see aluminium cases becoming illegal in California.
-
DTDs, Schema, and XDRActually, if you check the source, you'll see that they are using XML namespaces and schemas. Actually, they're using something called XDR (XML-Data-Reduced) which was developed by Microsoft and is upwards compatable with XML schema. I'm familiar with schema but not XDR. For more information, you may want to check out these links:
- http://www.schemavalid.com/faq/xml-schema.html#a4
- http://www.netcrucible.com/xslt/msxml-faq.htm#Q13
- http://www.ltg.ed.ac.uk/~ht/XMLData-Reduced.htm
- http://www.w3.org/TR/1998/NOTE-XML-data/
-
Use a really simple format e.g. PPM
If you're really bothered that JPEG will become unreadable - something I doubt, givent that it is an open format - try something like PPM. Straightforward RGB raster format, and you could even put a description of the format in the comment field at the top of the file!
-
How my company does it:
We shoot script(1) up everytime an admin procedure gets started so we document every line that appears on a terminal. Later we add commentaries for each of these lines, explaining it's purpose, and archive the hole file.
With this you can infer what an specific environment looks like, how installations were dealt with or problems were solved, amongst other administrative duties.
I guess the only disadvantage is that all your administrators will have to learn how to get away without emacs and vi, since they usually don't do well with script(1). Of course, there's always ed(1). -
How my company does it:
We shoot script(1) up everytime an admin procedure gets started so we document every line that appears on a terminal. Later we add commentaries for each of these lines, explaining it's purpose, and archive the hole file.
With this you can infer what an specific environment looks like, how installations were dealt with or problems were solved, amongst other administrative duties.
I guess the only disadvantage is that all your administrators will have to learn how to get away without emacs and vi, since they usually don't do well with script(1). Of course, there's always ed(1). -
Re:How appropriate...
I think Lagavulin is nicer -- Laphroiag is a bit, er, over the top. Lagavulin still has the strong Islay character (think ultrasmoke), but stops short of whacking you over the head with a crowbar; instead, it
... seduces you. Christ I can't begin to explain how perfect the stuff is. Just try it.
-
Death of the music industryThis is an important idea that doesn't work very well yet. When it does, it will kill the music industry.
First, computer-generated singing from MIDI files can be done better. Listen to Festival Singer, from the Oregon Graduate University of Science and Technology, which is in turn based on a speech system from the University of Edinburgh. It's still not that great, but progress is being made. They're approaching the garage-band level.
More components are needed to make computer-generated music more human-like. Some of that work has been done. The Media Lab system for Expressive Performance Extraction takes in a MIDI file and an audio recording of piano music, and builds a model of the performer's expression. This model can then be used with other MIDI files to mimic the specific pianist.
The next big step is to do that for singers.
The goal is to have a system where you put in a MIDI file, lyrics, performer and singer models, and push start. Out comes a performance that sounds like a good backup band.
Because the music industry likes to have the option to replace performers, copyright law doesn't prevent doing this on popular music. You only have to pay a modest statutory royalty to the original songwriter.
Once this works, it could make a real dent in the music industry. Performers could go the way of orators. People would still go to live performances, but we could dispense with much of the recorded music industry.
-
Our experiences over 5 years...
We, that is two of us, have been doing this since 1997. Our site Internet Technical Documentation Archive (ITDA) houses a lot of freely available Field Service Manuals.
We started with borrowing local scanning resources and manually page flipping. That's one page per every 5-6mins! Then we bought our first LPT scanner and it was a little faster but ate pages....
...ack depends on what you want to do. Like most people say do you want to totally destroy your books? How much do you want to spend? Are you ever going to use those physical books again?
If its just low cost and personal copy with reasonable quality and you have LOTS of time then.... just grab a copy of OmniPage OCR v11, a HP ADF scanner [ hp scanjet 5490cxi (C9863A)], a copy of Adobe Acrobat and get a professional company to despine your books.
We spent a total of $800 on software/hardware to do this. We spend, on average, about 50 - 200 hours per book to process it - thats scanning, OCR, OCR proofing and format rework and then final PDF output.. Some of the books we're doing I have given to students to work on. They'll do it for next to nothing ;-)
Its possible to outsource this to companies to do this work for you. For example Crowley do this and they also handle large documents. You have to be aware of how they are going to process your book and the copyright problems. However, as someone said, some don't care about copyright and some do (eg Kinkos). Again this comes down to do you care about the books and how much you wanna pay for a digital copy...
In our case we don't make money off this site so we can't afford to out-source. So our biggest problem now is how we are going to get the over-size PDP-11 documents into PDF. The Minolta PS7000 looks like the beast we need but its way too expensive for a non-profit. We'll probably be out-sourcing and eating the costs.
My suggestion is to either go the HP scanner+Omni+Adobe PDF route OR out-source it if you can afford. At least with the out-source option you get to keep your books intact.
ITDA Team -
This was my final year project thesis
This was my final year project thesis. Just remember the golden rule unstructured 2 structured == convert 2 XML I wrote a [very bad] program in C++/Perl/tcsh IPC=pipes to add XML tags to English, and then index them into a search engine which would use the lingual data stored in the XML tags to help the search.
NIST does a MASSIVE competition on this annually. I don't want to be an XML-buzzword whore <Arnold Schwarzenegger accent> (XML commando eats Green berets, C++, Java, Perl, COBOL for breakfast)</Arnold Schwarzenegger accent> but you can't beat XML for easily converting anything that you can make sense out of into computer readable format. Real h3cKoRs use SGML, but us underlings have to stick with things we can understand like XML. As for expandability, if we want to encode something else into the document, then just tag-it-and-go
It took me 200 hours to fish out all these links (before the Google days), I don't want anyone to have to waste as much time as I did feeding the search engines exotic foods. It's a year old so pardon me for the odd broken link, armed with these you could probably turn jello into XML ;-)
My favourite bookmarx
PROJect[21 links]
Beginners' Guide[13 links]
Berkeley Linguistics Dept. Course Summaries, general stuffzzzzzzzzzzzzzzCryptic IR Vocabulary defined
Explanations of weird words like hypernym zzzzzzzzzzzzzzHow do we produce and understand speech
How Inverted Files are Created - Univeristy of Berkeley zzzzzzzzzzzzzzNLP Univ. of Indiana, very good basics e.g. word sense d
Simple langauge - useful.... zzzzzzzzzzzzzzWhat is Natural Language Processing, links
What is POS tagging........ zzzzzzzzzzzzzzWord Sense Disambiguation defined
Word Sense Disambiguation in detail, scroll down far zzzzzzzzzzzzzzWord Sense Disambiguator - LOLITA (tested at MUC-7 and SENSEVAL competition as best)
XML for the absolute beginner
HTML, XML stuff + parsers[19 links]
Apache plug-in that uhhh does stuff with XML zzzzzzzzzzzzzzConvert COM to XML
convert XML, HTML to Unix pipeable formats zzzzzzzzzzzzzzconverters to and from HTML
expat XML parser zzzzzzzzzzzzzzHTML Tidy - converts HTML 2 XML + source code!!
Parse DB (RDBMS, whatever) to XML zzzzzzzzzzzzzzPerl-XML Module List
PHP Manual XML parser functions - what the hell are they talking about, PHP Virtual M... zzzzzzzzzzzzzzPublic SGML-XML Software
Pyxie - XML Processor for Python, Perl, etc. zzzzzzzzzzzzzzSGML+XML tools.org
The XML Resource Centre - massive number of links zzzzzzzzzzzzzzW4F wrapper - wrapper converts XML to HTML
XFlat - convert flat file into XML zzzzzzzzzzzzzzXML Parsers and other XML stuff
XML.com - Parsers, etc. zzzzzzzzzzzzzzXML-Data Catalog System - uhhhh looks close
XTAL's general converter - convert anything 2 XML
other Background[8 links]
Is Linux ready for the Enterprise, scalable... zzzzzzzzzzzzzzLinux reliability
Linux Versus Windows NT, Mark(sysinternals bloke) zzzzzzzzzzzzzzPC reliability (pcworld)
SPEC - Standard Performance Evaluation Corp. zzzzzzzzzzzzzzSystems benchmarks
TPC - Transaction Processing Performance Council zzzzzzzzzzzzzzUnix Beats Back NT In EDA Workstation Arena
Proper TREC(-8) QA systems[2 links]
pg. 387 LIMSI-CNRS pretty deep parsing[2 links]
More links....
NLP, IR links - lots to corpii, etc.
pg. 575 U. of Ottawa and NRL (shit system, got 0%)[1 links]
LAKE Lab
pg. 607! University of Sheffield (crap system, but OPEN SOURCE!)[2 links]
GATE - FREE IE app w`source code
LaSIE - ER, coreference, template (cv)
pg. 617 Univ of Surrey (inconclusive matches)[2 links]
System Quirk - Or is this their search system..... Hmmmmmm
Univ of Surrey - pointers (hopefully this is their WILDER search system...)
SMU - Pg. 65[1 links]
Natural Language Processing Laboratory at SMU
Textract[2 links]
Cymfony - Technology
Textract - State of the Art Information Extraction
Xerox uhhhhh maybe[1 links]
Xerox Palo Alto Research Center
(OVERVIEW) 1999 TREC-8 Q&A Track Home Page
NLP bloke, Univ Sussex
Tcl-Tk[4 links] Tcl tutorial
Tcl-Tk Contributed Programs Index
Tcl-Tk Resources, sources
TclXML - manipulating XML using Tcl-Tk
Artificial Natural Language - Is this what I'm trying to parse into...
Comparison of Indexers - Prise vs. Inquery vs. MG, etc.
Eagles - Language Engineering Standards
Language Technology Group - lots of modules!
LDC - Linguistic Data Consortium, lots of corpora
Lexical Resources
Links 2 resources, indexers.....
Lots of IR stuff, University of uhhh
Managing Gigabytes Indexer
Managing Gigabytes Manuals and stuff
Htdig search system
NLP & IR (NLPIR, NIST) Group
OVERVIEW OF MUC-7-MET-2
Perl XML Indexing - XML search engine type thing
Phrasys Language Processing Software Components (money)
QA HCI bullshit
SIGIR - TREC-type thing, resources
SMART indexer system documentation
Text REtrieval Conference (TREC) Home Page
The Natural Language Software Registry
Thunderstone IE and IR products
WordNet - FREE DOWNLOADABLE lexical English database
Page created with URL+, nice utility for working with internet shortcuts -
Re:The Answer to the universe
I suppose you mean the RGB hex color #424242.
-
Corporate Intranet Index Engines?When I was an Intranet webmaster at Motorola, we used 'FreeWAIS' for Intranet indexing, until Corporate security decided that indexing everything was a security risk
:-)Not kidding. I work for a very large multinational and the corporate search engine is an excercise in frustration. It's purpose in life seems to be to return bizarre and obscure documents as the results of it's searches.
You actually got results returned from your search server?
Lucky bastard. Our corporate Intranet search engine usually would just return 'Query Timed out'. Eventually they just took the search boxes off all the web pages.I've since built a simple Harvest index for the Intranet.
It can be very interesting finding all of the 'cobweb' documents on intranet sites. Ancient documents relating to projects and managers long since vanished among other stuff that management would prefer to see forgotten...
There are some cool features that are unique to Google, but I'm not sure if 'Convert PDF to HTML' and 'highlight search terms' are worth $20K.
-
Aiya Ambar
Just found this
-
Re:magic numbers + the synth sux
I was disappointed by the bad speech encoding. I had expected in 2002 you'd actually be able to synthesize a voice that sounds close to human or at least be understandable. The old amiga 500 had a utility that was much more understandable than this is.
I don't know about the Amiga but I had an old TI99-4A that had a speech synthesis module. It was quite good at reading most words but had a built in list of words it could read. You could get it to read other words but it meant that you had to express the word in a special way so that the module could pronounce it properly. That really defeats the point of text to speech.
I think text to speech has come on a long way since those days but it seems like slow progress which is due to the complexity of the subject. There is a good open source text to speech engine called Festival. You can test it with your own text here
. -
My experiences with 'cheat-detection'
In my first year of university they had the bright idea of running some plagurism detection software against our classes submissions. I believe 127 people were accused of cheating by the CS department - including me.
I was sent a letter telling me that I had been accused of conspiring wiht one other person and consequently my mark would be halved.
Naturally I was outraged and got on the phone to the head of department. He explained that my submission was unacceptably similar to one other person and either someone copied it or we had collorated - I hadn't collaborated, copied or let my work be copied.
I arranged to meet with the course organiser and they showed me both submissions. Mine had originally been given 34/35 and the other had been handed in 2 weeks late and even then given 0/35. The other submission looked virtually identical to mine but had oddities like capital I's as loop control variables (suspiciously as if it had been typed into M$ Word). My guess is that he'd picked my code up from the recycle bin in the lab and typed it in.
However faced with this, they still argued that I could have allowed this person to copy my code (even hinting that I might have accepted payment for it) and if I had any further evidence to prove my innocence then I should draw it to their attention.
My father and I responded that it wasn't right that I should have to prove my innocence since it's a basic human right to be presumed innocent until proven otherwise. We suggested we would seek legal council, and they were quick to write back reinstating my original mark.
What frustrated me further was that the other party involved (who was never identified to me) was punished equally - by having his mark of 0 halved!
Cheat detection systems are fine as a mechanism to prompt staff to possible problems but they certainly shouldn't be used as the judge and jury.
Given that CS typically has large class sizes - mine was over 300 at one point - and CS assignments are often quite short and often closely related to textbook examples ... it's infeasible to hope that no two students will produce very similar results.
The other thing that's NEVER been made clear to me is the distinction between permitted collaboration and plagurism. Every university document is fairly vague about what's acceptable and what's not. And as one of my other professors put it - "In the real world before you embark on any assignment it's worth asking, searching, begging and borrowing as much of it as possible" -
that nottingham software..caught loads of cheats at edinburgh university a few years ago. It works, then.
(for those interested, the stunning ms roe subsequently left scotland in a huff)
-
Re:Remember HBO?A similar form of satellite sabotage occurred to the Kurdish Television station Med-TV in 1995. This press release from the London-based Kurdish station which broadcast to all of Europe, the Middle East, and North Africa on Eutelsat describes how a live program was taken off of the air by a second, more powerful uplink. It states:
the second carrier responsible for the jamming was spotted after MED-TV's own transmission carrier was intentionally dropped to identify the origin of the pirate interception. Secam colour bars, implying deliberate jamming, were momentarily seen, but an identification of origin could not be made. MED-TV's satellite service providers suspect sabotage, and official sources say the cost to jam MED-TV's signal would be approximately a quarter of a million pounds, as it requires the use of high calibre technology. MED-TV's engineers believe that, given its nature, the intercepting signal was transmitted from a European country.
Clearly, which ever group was responsible for the jamming of the Med-TV broadcasts (it occurred regularly during the existence of the station) had the equipment, scientists, electricity, and monetary resources to exploit the weak satellite security. While I am unsure of the culprit was ever identified, most news reports assumed that the Turkish government was the likely suspect. Dr. Amir Hassanpour writes in A Stateless Nation's Quest for Sovereignty in the Sky that giving the timing and other evidence, Turkey's motivation for sabotage was political. The 2002 World Press Freedom review also identifies Turkey as the high-tech criminal where it writes:"Med TV's news and panel discussion programmes have been targeted by jamming signals emanating from Turkey, while in Kurdish regions of Turkey, satellite dishes and antennae have been prohibited and destroyed by soldiers and police units."
I suppose the point I am trying to make clear is that hacking of satellites will likely occur for political motivated reasons, as opposed to curiosity or to fill a cracker's boredom. The consequences if caught and prosecuted are severe, and success requires satisfying a complicated, dynamically changing equation of location, resources, knowledge, and timing. -
Re:Genetic Algorithms are not newI believe one use that has been found for them is in creating exam timetables; you have a clear set of guidelines (i.e. you want these exams spaced out, these cannot clash etc) and you leave a computer to work them out. IIRC, Edinburgh University uses a program using GAs for this very purpose.
Also, a lot of what is being discussed sounds like Neural Networks as well; gates interlinking and 'learning'. I found it interesting during my MSc, and the field shows some promise if they can get over the factor discussed of "how do you trust something you can't explain?"
-
Re:The Next Step
What with all the automobile modifications for running computers, the next logical step is to simply replace the windshield with a monitor.
Yeah, like in the old "Spectrum Pursuit Vehicles"!!! (From this page)... -
Re:The Next Step
What with all the automobile modifications for running computers, the next logical step is to simply replace the windshield with a monitor.
Yeah, like in the old "Spectrum Pursuit Vehicles"!!! (From this page)... -
Want to help your fellow Linux gamers?
Linux Castle Wolf rocks, but be aware of the following limitations:
1) The current binaries are multiplayer full version only. No single player.
2) The game must be installed under Wine or Windows. Which sucks...
Now for the interesting bit:
Everyone will love you if you fix this piece of software (or make your own) to work with modern Wise Installer archives.
The Wolf3D CD contaisn a 500MB win32 setup.exe file in which the necessary pak files for the linus install go. Thsi is in wise installer format, which is similar to zip with different headers. Coders - if you can work out a way of extracting this archive under Linux, you have my external respect and the love of Linux Gamers everywhere. It should be pretty easy from my research (the software mentioned is non free qmail license source code) for someone with the skills, but I'm still teaching myself Kernighan and Ritchie.
-
Re:Look into JunkBuster
google to the rescue:
popup filtering junkbuster
ain't the GPL great? -
A few ideas
Well, there is a few speech synthesis programs that are quite nice, festival (good) or IBM:s ViaVoice (excellent) for example.
However, only a few application supports speech-devices. But since its possible to use many application in plain textmode from a VT102-terminal (pine for e-mail, editors, links for surfing etc) wouldnt it be great if somone developed a braille display that you hook on to a serialport and replaces the screen.
(Textmode rules! I do 70% of my computing on the VT102 terminal in my livingroom).
I believe that there are some support for speech devices in the kernel aswell, unless im wrong.
Furthermore i'd like to direct you to BLINUX
(I use viavoice to read me a bedtime story every now and then, but found out that a Mommy is better at that - afaik she never kept on reading after i fell asleep) -
Re:Mac OS has that
Actually, on a more serious note, is there anyone working on an open source speech synthesis project?
Yup; it's called Festival. -
Re:Mac OS has that
Yep, it's called Festival, and the results are pretty decent. Became free as in speech a couple minor versions back, too.
-
Re:Announcing open source VoiceXML interpreter
NOTE: This is a VoiceXML interpreter. A real system would require a full speech recognition engine and a full text-to-speech implementation.
This is of tremendous importance to a project I'm working on with my university. We are setting up an online testing system for rehabilitation training, and one of the issues we've been having is deciding what features we can use that will work with text readers. (If I can get it setup correct, ) I'm going to show off a kiosk system using OpenVXI and the open-source projects Festival for speech synthesis and CMU Sphinx for speech recognition. -
Anteriority
Edinburgh Center for Speech Technology research got far anteriority :
http://www.cstr.ed.ac.uk/projects/sable/
http://www-2.cs.cmu.edu/~awb/festival_demos/sable. html
-
Re:Same problem with 800 phone numbers?
Almost a decade ago, as part of the whole USL vs. BSDi lawsuit (see the section "The Lawsuit" in McKusick's history of BSD for details), one of the complaints was for trademark infringement for advertising their phone number as "1-800-ITS-UNIX".
-
Re:Not to rain on anyone's parade...
A proper voice recognition system should be able to understand any words in the English language... the chances are this system is simply used to control a few Palm commands and therefore the incoming speech patterns only need to be compared to a few stored patterns. Then a system of pre-synthesising the outgoing speech would reduce further the demands on the CPU but use more disk. I have my Pentium 75 talking to me using the University of Edinburgh's Festival system on Linux by pre-synthesising the most important words.
By the way, the festival system is excellent and takes under ten minutes to download, compile and install!
-
Re:Genetic ProgrammingI read somewhere (can't find the reference right now, sorry) that some work was being done whereby the genetic programs were being evolved that could themselves create neural networks.
you mean something like the SGOCE programs? citeseer is your friend. I implemented a modified SGOCE algorithm to evolve controllers for a simulated walker. go here. I'll get someone to read this if it kills me.
-
article didnt mention GA evolution of neural nets
i want someone to read my thesis. It seems a shame for it to go to waste after all that work - c++, python frontend and gpl, what more could you want. go on please take a look.
-
Actors are safeI don't think actors are going to lose their jobs on this one. The time it takes to get a synthesised voice intoning correctly (using phoneme manipulation) is prohibitive, when I can say to an actor "Let's read that again, but this time be more scared".
Actors, for all their foibles, spend many years in formal or industry training, learning how to mimic a person / emotional state. A good actor is (if you'll forgive the horrible cliche) like clay in a director's hands. An average TTS could never be as flexible, as quickly.
As an aside, I've played with the AT&T TTS engine (which has been sitting as a useable demo since at least April) and with the festival TTS system and festival seems to be more on track to genuine speech reproduction, although I still have yet to hear a convincing TTS rendition of any form of performing art. We shall see.
-- -
One more step...
Prior to this, the best sounding speech synthesis I had heard was from the Festival system, which is still pretty good - epecially considering it has an open source license, something the AT&T system doesn't.
Another good speech synthesizer, no doubt an early version of the AT&T one (possibly?), is by Lucent.
Still, I am amazed at the quality of the AT&T system - it sounds almost perfectly natural. To the naysayers that say "No, it isn't natural" - what all of you have to realize is that this simply demo doesn't allow you to tweak all the variables that would really allow the inflections or type of voice (like whispering, etc) to really come through - it is too bad they don't give an advanced interface with a FAQ or some other form of documentation to allow this, but I imagine that if they did, it would probably take quite a while to compose even a simple sentence (I remember the hell you had to go through with an old Radio Shack speech synth for the Color Computer, specifying individual phoenomes (sp?) just to get proper speech to come out - it could pronounce many words, but others it just fell flat on its face).
Finally - something I want everyone to ponder. Take a look at this old article (it was about Square redubbing FFTM) - once it loads, search for "cr0sh" and "I dare say" - you will come across a series of comments about what I think may happen in the future - what is funny is that the comments in reply to my take on things sound like your typical naysayers. How many computers were we supposed to only need back in the 60's? How much memory would people "only" need again Mr. Gates?
What I predict will come about - probably sooner than we can all imagine. It may not be cheap enough to do it now, at a quality that people would watch, fast enough to be done quicker than what can be done with live actors - but it is all software and hardware - this stuff will get faster and cheaper. Anybody who has been in this business long enough knows that it will happen. There might still be a need for actors, and voice artists, and such - but they probably won't have the "god" status society seems to confer on them now (with the exception, perhaps, of stage acting - which will probably enjoy a huge comeback).
Worldcom - Generation Duh! -
Other Online Demos
Some links to other online demos, so you can compare:
http://www.elantts.com/indemo.htm
http://www.cstr.ed.ac.uk/projects/festival/userin. html
http://www.flexvoice.com/demo.html
http://www.acuvoice.com/downloads/ttsdemo.html
I searched for good TTS software to give voice to some of the 3d animations I did in max ... but I did not find anything satisfactory... :( -
Not the first laptop
The Espon HX-20 came out before the Model 100. Anti-Microsoft conspiracy buffs will note that the built-in software in the Model 100 was written by Bill Gates, so maybe that explains the revisonist history.
-
For those of you who are knocking the concept ...
... I would refer you to this article which discusses structured environments. Basically to be efficient at something, e.g. preparing a dish, we line up all the instruments in a particular order to help cue us when a particular task needs to be done. Extending this into a semi-spatial (if ESR can get the right relationship of in-on-up-etc representing the true connectivity between kernel modules) setting would help people orientate themselves and get to work faster (so you can get back to that QUAKE game). Just like we would put a letter by the door, or shopping list by the car keys, each cue triggers associated memories and reminds us of specific actions that need to be done. Sure you can have a linear check-list like the space shuttle pilots but anything computing-wise is so variable that a more flexible arrangement is desirable. I've been looking into something similar for the make processing for reproduceable documents and it is not as easy (or trivial) as people think (at least to get right). For example, color assignment
... do you map this property to time-of-last-modification (ie heat colors) or to likely hazard (red=stop, orange=hazard, yellow=caution, etc). If you dig far enough, you eventually realise it is actually a text variant of scientific visualisation but in the qualitative domain. There are a number of theories on how you allocate the properties based on human cognitive functions (see Lloyd w.r.t. OpenDX and fiber bundles) ... but there are a lot of issues remaining such as what works for 5-6 objects doesn't work for thousands. Like all ideas, let the users decide and if it works, it will be included into the meme-pool. Imagine TuX, the penguin avatar trundling around kernel space looking for fish or and fixing security leaks. At the worse, it will provide a few minutes of amusement. LL -
Festival
The Festival speech synthesis system page also has a web-based text-to-speech converter here, without the filter. It's free software, and does a pretty good job.
The automatic voice pitch is pretty neat; I built a hardware text-to-speech converter around 10 years ago, and it only produced a monotone voice that got pretty annoying after a while. Don't feed Festival raw HTML documents, though - it can cause the voice to get deeper and deeper until it has to reset the pitch. ;-) -
Festival
The Festival speech synthesis system page also has a web-based text-to-speech converter here, without the filter. It's free software, and does a pretty good job.
The automatic voice pitch is pretty neat; I built a hardware text-to-speech converter around 10 years ago, and it only produced a monotone voice that got pretty annoying after a while. Don't feed Festival raw HTML documents, though - it can cause the voice to get deeper and deeper until it has to reset the pitch. ;-) -
Re:64MB for power, or to cover crappy programming?