Oh, wait, I don't work in that business anymore. Never mind.
That said,
It might be difficult to scroll through the list to find a particular song using those tiny buttons
List selection is one of the areas where speech recognition can really shine. The recognition task is usually fairly easy (or, in the case of phonetic ambiguity, impossible), and it fills a real gap in the other available interfaces. On the down side, though, when it goes wrong it's a pain to correct a mistake. "No, I meant that other one of the six thousand items in the list."
Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.
This kind of thing comes up every time speech recognition is mentioned here, and it's largely missing the point. Desktop speech recognition, as handled by Dragon NaturallySpeaking, is a very different problem from simple commands and list selection, and it has very different solutions. If you have to recognize and transcribe arbitrary sentences in a given language you have to handle a much larger search space in basically every dimension -- so much larger that the optimal search techniques can be very different, and (as in your comment) the resources required to implement those techniques will be incomparable.
I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.
Does the voice recognition filter itself out? When U2 sings "one" I don't necessarily want it switching to Aimee Mann's "one" and vice versa
From the review:
Navigation using VoiceNav only operates when a song is not playing (manual controls will allow navigation when a tune is pumping), therefore there is no "Stop" or "Pause" command.
So they punted on that problem.
On another front, tt looks like "one" isn't likely to produce useful responses from the speech recognition in any case. The only times the reviewer seems to have gotten acceptable recognition of track names were when saying the entire artist and title.
Alpha has no x86 compatibility, Itanic promises some.
The FX!32 emulation/recompilation system actually provided very good transparent x86 compatibility on Alpha WinNT systems. This definitely fits high on my list of Really Cool Technologies; for most programs I could just forget about whether or not they ran native. The biggest drawback was a marketing one: it wasn't hardware x86, so as far as perceptions were concerned it wasn't real. I used it on some pretty challenging apps, though, and it worked quite well and transparently.
This is actually an interesting perception issue. Itanium has hardware support for x86 and software support for PA-RISC. The original article attributes this to a lesser priority being given to PA-RISC. While that may be true, it may also be due to the PA-RISC customers having less of a "real PA-RISC" hardware vs. software hangup than x86 customers.
So what's holding it back? [...] we still need to see more applications.
What applications, though? In particular, what applications for which Itanium would have an advantage? Whether or not the architecture is a good idea long-term, the current implementation doesn't offer a compelling performance advantage for any one application. What kind of software could make the current hardware sell?
I spent my first few glances at this wondering whether it was more likely that someone would put a compact flash slot into an Apple II or whether someone would put an Apple II (equivalent) into a compact flash device. I wonder if the latter wouldn't be marginally more useful: got an old Apple II program? Run it on your handheld.
I wish I had access to my home library right now to check references and details, but there have been a bunch of proposals like this, most of them decades ago. Some got fairly far along, consuming large amounts of money, time and effort, before ultimately collapsing. There was one book in particular I remember addressing a French experiment with alternate light rail ideas.
Working from memory, one problem they hit was the decision of whether to go for the ultimate taxi-like model, as here, or with some intermediate-sized light rail cars. The problems I recall with the taxi-like model included the problem that they couldn't get them connected to enough points for people to actually use them enough to pay for the service; and nightmare scenarios about someone's grandmother being trapped in an unmanned small car with a deranged killer.
I'm hoping someone else can come up with the title of the book on the French experiment I'm thinking about. It focussed much more on the social problems with making this kind of project happen than on any particular technical difficulties.
Two problems: first, they're talking about messaging, not downloads from a central source, so it's harder to match the message attachments to a central cache image. Second, one of the major applications they're talking about is verbal chat, where there isn't a single high-demand image, but rather a separate recording attached to each message.
That said, I'm sure a good chunk of the multimedia traffic would, in fact, be people passing around the latest hot recording.
I thought it was a nice arrangement in the article, having the last two sections of the article be about
A cool, novel technology with tiny cantilevered sensor/writer tips over a polymer surface giving amazing data density, and
An incremental improvement in magnetic disks giving finer control over the head positioning.
Given the history of storage technologies, what odds does anyone here want to give to the commercial success of Millipede vs. magnetic micro-drives, even in small consumer applications that currently use flash?
There are commercial products from Mercury Interactive, Rational and so forth that do this.
Having spent some time with Segue's Silk* tools (aimed at the same market) I'd say that tools like theirs are
expensive, and
invaluable for really testing a GUI and keeping it tested over the long term. It's even worth getting only a couple of licenses for a group of developers rather than go entirely without.
I've seen a lot of other automated approaches consed up for testing various GUIs, but nothing replaces a tool that can actually go out and interact with the GUI directly through the windowing system. The simplest tools will let you record mouse and keyboard events; the usable ones have enough understanding of the underlying interface primitives to reasonably abstract your actions. E.g., figure out and record that you're pressing button "foo" in box "bar" rather than clicking at 80,130, or that you're editing in a text entry box rather than just getting a bunch of keyboard events, so that when you reformat a dialog box the test case doesn't break. That encourages developers to do the right thing and build up and maintain *lots* of test cases; there are a lot more stupid corners for testing in a GUI app than the command line equivalent.
NMR (Nuclear Magnetic Resonance) was developed by physicsts, and later applied to medicine by biophysicists. At the advice of some marketing genius, they changed the acronym to MRI, knowing that most of the public wouldn't go into a giant machine with the word "Nuclear" in it's title.
I'd heard this renaming story for years, and it's been frequently retold, but [Warning: vague sources follow, filtered through memory] in the past year or so I was listening to an NPR program (possibly "Talk of the Nation") and a caller who claimed to have been around for the renaming said that while "NMR" became "MRI" for PR reasons it was for internal rather than external PR -- basically an academic turf war: in terms of funding and/or department responsibility it was very strongly ingrained practice that anything with "nuclear" in the title was the responsibility of the faculties of radiology, rather than medicine, and the faculties of medicine didn't want to let go of it, so they renamed it.
...by setting the example that we must give these rights and that they are not explicitly given anyway goes to show that we DO NOT HAVE THE RIGHTS TO BEGIN WITH.
...The ruling that online press too have freedom of the press just shows us that it isn't a right, and could be (and in the future, might be) taken away.
At least in the case of the US Constitution, and rulings drawn from it, you may want to take a closer look at the language of the Bill of Rights. It doesn't grant rights, it restricts the ability of the government to infringe on the rights that it lists:
"...Congress shall make no law... abridging the freedom of speech..."
"The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated..."
"...the right of trial by jury shall be preserved..."
etc. No rights are granted. Existing rights are acknowledged.
Given the "what if we..." comments following up here, I strongly recommend reading Kim Stanley Robinson's "Red Mars", "Green Mars" and "Blue Mars". The books start with a near-future colonization of Mars and go through one very well developed "what if" path, covering not just the technology but also the social and political engineering that follows. The idea of deliberate greenhouse warming shows up, among others.
Oh, yeah: they're also good reading, with an interesting set of characters.
I would guess that Microsoft is well aware of this effect, though. Is there anything requiring that XP be equally tight about activation for all national variants? (Not a rhetorical question.) If not, I'd expect Microsoft to be much more lenient in countries where this was a real threat to their market share.
This isn't a particularly startling result. Many of the things an x86 compiler has to optimize for these days are similar across all processors: e.g., regular branch patterns are faster than unpredictable ones; you have very few visible registers; it's helpful to have closely associated data in the same cache lines; you're usually better with the RISCy subsets of the ISA; etc. Intel would have had to go well out of its way to optimize for their own chips and pessimize for others, and I can't see Intel bothering.
Just because Aerie purchased the assets for $8.25 M doesn't necessarily imply they can immediately turn around and sell these assets for a profit of some $740 M as you've stated.
Sorry, that's not what I meant. Rather: Metricom's failed business plan consumed $750M to get their infrastructure to where it is now. Maybe they were foolish and threw money away, or maybe that's just what it cost to build what they built in the time period in which they built it.
My point was just that Aerie stands a much better chance of having their business plan work, whatever it is, given that their cost of "building" a working infrastructure is $8.25M+reactivation costs and legal fees. That's not a business plan that Metricom could have followed.
Someone please double-check my reading of the press pieces:
Metricom winds up $1G in debt, with creditors expected to recoup about a quarter of that when you count cash in the bank and bond interest; so, about $750M loss for creditors.
Aerie acquires all the interesting assets of Metricom -- that is, everything unique that would cause investors to take the risk -- for $8.25M.
Chapter 7 protects Aerie from the ~$740M difference.
I know it's hardly a unique situation, but the numbers jumped out at me this time. It's such a great investment strategy: if you can just figure out how to get someone else to spend 100x what's profitable and then have them lose so badly that you can buy the interesting bits at a garage sale.
Sorry, I don't buy it, if you're arguing that removing consonants makes command discrimination easier. Yes, consonants, especially stops, are hard to identify on their own, out of context, but we're with speech recognition you're not doing an instantaneous reading on a single sound out of context. You're doing an order- (and usually duration-) sensitive match of an entire utterance against one or more patterns. Even if you collaps all the stops, nasals and fricatives into a single "consonant" model, which is incredible overkill, the presence or absence of consonants in the pattern still gives you information.
But voice activated systems are stupid, anyway...speech is one of the slowest forms of human interaction, and is one of the few we have to actively concentrate on to perform.
You're faulting speech by comparing full-out, general spoken natural language to much more restricted modes of input, like a button or dial. The spoken equivalents of a button or dial can be as quick and easy as the tactile versions, and with very little practice they become just as automatic.
Speech isn't the right input method for everything, but then neither is the keyboard, the mouse, the pen, or the steering wheel. Computer-targeted speech is good, and worth putting some effort into, when:
You need to quickly select from a very large, known list of options; you very quickly hit the point where speech is way faster than mouse, dial or stylus.
You need to do something with text and a keyboard just isn't feasible for size, convenience or portability reasons. Speech input of text is almost invariably faster than a stylus.
Your hands are busy, remote (from the device, though I suppose also if they're remote from you) or incapacitated. "Remote" in this case includes "at the other end of a phone call".
You're dealing with with text that's sufficiently uniform that you can speak at full speed, in which case you're going faster than almost everyone types. (And there are contexts where that occurs.)
I don't know, but I already learned one interface (typing) to make my computer's life easier. Why should I do all the work?
This is probably the single biggest problem that large-vocabulary speech recognition had and has in getting adopted, even where it's a good fit: it requires you to learn to use it rather than "just talking". Some people say "I already learned one interface..." Even more have simply forgotten how long it took them to get comfortable with a keyboard and compare the pain of a new interface to the habit of years.
Any new interface requires some accomodation from the user.
The computer can't distinguish words easily, so we'll give you a potentially much smaller vocabulary and see if it does better? Of course it'll do better, whether or not that smaller vocabulary contains consonants.
What I'd worry about is whether these unarticulated sounds sound more like background noise than articulated speech; if so, then you've made the situation worse by making it harder for the computer to know when you're talking to it.
On "uh oh": Dragon Dictate (discrete speech recognition from a few years ago) used "oops" for telling the SR system when it made a mistake; it was reasonably easy to distinguish from words that you actually wanted to put into your text with any frequency.
Oh, wait, I don't work in that business anymore. Never mind.
That said,
List selection is one of the areas where speech recognition can really shine. The recognition task is usually fairly easy (or, in the case of phonetic ambiguity, impossible), and it fills a real gap in the other available interfaces. On the down side, though, when it goes wrong it's a pain to correct a mistake. "No, I meant that other one of the six thousand items in the list."I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.
From the review:
So they punted on that problem.On another front, tt looks like "one" isn't likely to produce useful responses from the speech recognition in any case. The only times the reviewer seems to have gotten acceptable recognition of track names were when saying the entire artist and title.
This is actually an interesting perception issue. Itanium has hardware support for x86 and software support for PA-RISC. The original article attributes this to a lesser priority being given to PA-RISC. While that may be true, it may also be due to the PA-RISC customers having less of a "real PA-RISC" hardware vs. software hangup than x86 customers.
- It's got the Intel name on it, and will be marketed accordingly.
- It'll run Windows.
- It hasn't been dead-ended by its manufacturer.
Advantages over x86 processors:- It's slightly better for high-performance floating-point computation.
- It makes a better space heater.
- It'll keep determined assembly-language hackers out of your hair for a while.
I think that about covers it. Not all of those will be advantages for every consumer, of course, nor do all the advantages apply to every competitor.I spent my first few glances at this wondering whether it was more likely that someone would put a compact flash slot into an Apple II or whether someone would put an Apple II (equivalent) into a compact flash device. I wonder if the latter wouldn't be marginally more useful: got an old Apple II program? Run it on your handheld.
Working from memory, one problem they hit was the decision of whether to go for the ultimate taxi-like model, as here, or with some intermediate-sized light rail cars. The problems I recall with the taxi-like model included the problem that they couldn't get them connected to enough points for people to actually use them enough to pay for the service; and nightmare scenarios about someone's grandmother being trapped in an unmanned small car with a deranged killer.
I'm hoping someone else can come up with the title of the book on the French experiment I'm thinking about. It focussed much more on the social problems with making this kind of project happen than on any particular technical difficulties.
That said, I'm sure a good chunk of the multimedia traffic would, in fact, be people passing around the latest hot recording.
- A cool, novel technology with tiny cantilevered sensor/writer tips over a polymer surface giving amazing data density, and
- An incremental improvement in magnetic disks giving finer control over the head positioning.
Given the history of storage technologies, what odds does anyone here want to give to the commercial success of Millipede vs. magnetic micro-drives, even in small consumer applications that currently use flash?- expensive, and
- invaluable for really testing a GUI and keeping it tested over the long term. It's even worth getting only a couple of licenses for a group of developers rather than go entirely without.
I've seen a lot of other automated approaches consed up for testing various GUIs, but nothing replaces a tool that can actually go out and interact with the GUI directly through the windowing system. The simplest tools will let you record mouse and keyboard events; the usable ones have enough understanding of the underlying interface primitives to reasonably abstract your actions. E.g., figure out and record that you're pressing button "foo" in box "bar" rather than clicking at 80,130, or that you're editing in a text entry box rather than just getting a bunch of keyboard events, so that when you reformat a dialog box the test case doesn't break. That encourages developers to do the right thing and build up and maintain *lots* of test cases; there are a lot more stupid corners for testing in a GUI app than the command line equivalent.- "...Congress shall make no law
... abridging the freedom of speech..."
- "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated..."
- "...the right of trial by jury shall be preserved..."
etc. No rights are granted. Existing rights are acknowledged.Given the "what if we..." comments following up here, I strongly recommend reading Kim Stanley Robinson's "Red Mars", "Green Mars" and "Blue Mars". The books start with a near-future colonization of Mars and go through one very well developed "what if" path, covering not just the technology but also the social and political engineering that follows. The idea of deliberate greenhouse warming shows up, among others.
Oh, yeah: they're also good reading, with an interesting set of characters.
The actual MaGIC spec is available from Gibson's site.
I would guess that Microsoft is well aware of this effect, though. Is there anything requiring that XP be equally tight about activation for all national variants? (Not a rhetorical question.) If not, I'd expect Microsoft to be much more lenient in countries where this was a real threat to their market share.
This isn't a particularly startling result. Many of the things an x86 compiler has to optimize for these days are similar across all processors: e.g., regular branch patterns are faster than unpredictable ones; you have very few visible registers; it's helpful to have closely associated data in the same cache lines; you're usually better with the RISCy subsets of the ISA; etc. Intel would have had to go well out of its way to optimize for their own chips and pessimize for others, and I can't see Intel bothering.
My point was just that Aerie stands a much better chance of having their business plan work, whatever it is, given that their cost of "building" a working infrastructure is $8.25M+reactivation costs and legal fees. That's not a business plan that Metricom could have followed.
Someone please double-check my reading of the press pieces:
Metricom winds up $1G in debt, with creditors expected to recoup about a quarter of that when you count cash in the bank and bond interest; so, about $750M loss for creditors.
Aerie acquires all the interesting assets of Metricom -- that is, everything unique that would cause investors to take the risk -- for $8.25M.
Chapter 7 protects Aerie from the ~$740M difference.
I know it's hardly a unique situation, but the numbers jumped out at me this time. It's such a great investment strategy: if you can just figure out how to get someone else to spend 100x what's profitable and then have them lose so badly that you can buy the interesting bits at a garage sale.
Sorry, I don't buy it, if you're arguing that removing consonants makes command discrimination easier. Yes, consonants, especially stops, are hard to identify on their own, out of context, but we're with speech recognition you're not doing an instantaneous reading on a single sound out of context. You're doing an order- (and usually duration-) sensitive match of an entire utterance against one or more patterns. Even if you collaps all the stops, nasals and fricatives into a single "consonant" model, which is incredible overkill, the presence or absence of consonants in the pattern still gives you information.
Any new interface requires some accomodation from the user.
The computer can't distinguish words easily, so we'll give you a potentially much smaller vocabulary and see if it does better? Of course it'll do better, whether or not that smaller vocabulary contains consonants.
What I'd worry about is whether these unarticulated sounds sound more like background noise than articulated speech; if so, then you've made the situation worse by making it harder for the computer to know when you're talking to it.
On "uh oh": Dragon Dictate (discrete speech recognition from a few years ago) used "oops" for telling the SR system when it made a mistake; it was reasonably easy to distinguish from words that you actually wanted to put into your text with any frequency.