Domain: autonomy.com
Stories and comments across the archive that link to autonomy.com.
Comments · 27
-
Autonomy creates super magic computer?
Can anyone tell me what this thing actually does?? http://www.autonomy.com/content/Products/products-idol-server/index.en.html HP offered to buy this company, which for all appearances, serves a great purpose: Enterpriseyness
Seriously, what the eff do they do? It reads like they invented AI, but I think I would have heard about that. If HP is serious about this company, I think someone at the top is enforcing a new mass hysteria policy.. Either that, or I am just seriously too dumb for this Autonomy company who is clearly the "market leader" as their quote banner says (in what market again??).. -
Re:Who's the real winner?
It's something that will compete directly with Autonomy. This kind of searching is big in certain industries, law perhaps the foremost. The ability to search through documents quickly and find what you want means you can fire your legal aids. That is worth what, $60k a year, per person that you can fire? So people are willing to pay big money to Autonomy and IBM. I'll bet Autonomy is wishing they'd thought of this stunt.
-
Already done. Sort of...
http://www.autonomy.com/ has had a working, enterprise search version of this for quite some time. While not a web search tool, it's very much along the same lines.
-
Re:Standard Machine Learning...
Disclaimer: I work for Autonomy ( and worked for Google too)
Yes, you describe correctly the technology. And it works exactly like in the patent, even that it's complicated described in the patent legalese. The TFIDF you describe is an application of Bayes theorems of probability. They are terrible efficient.
Once you have enough data, you can categorize any content, and Virage is quite cool extracting text from an audio/video file. As some one has said, it _can_ recognize a Jay Leno video once you indexed enough Jay Leno videos.
Feel free to check (skip the usual marketing stuff) the http://autonomy.com/ website. -
Google just playing catch-up in enterprise search
This is just Google finally doing what Other Enterprise Search vendors have been doing for years.
Any worthwhile enterprise search has been able to search across multiple data types and sources long before this "news" by Google.
-i -
Ever heard of...
I have been working with search engines for some time now, and the 'concept search' that IBM is mentioning is nothing new. Actually, the Cambridge UK-based company Autonomy http://www.autonomy.com/ has been market leader in this field for years. IBM even has some specialists on Autonomy working for them... makes you wonder...
-
Autonomy's been doing this for years
See their technology overview. I believe they have a number of (ugh!) patents on Bayesian text analysis. They were founded by a Dr. Michael Lynch to productize research he did at Cambridge U.
-
It already exists
I've already seen/heard of such system, basically in the Business Intelligence field.
In England, a systems like Autonomy (used by the police at the beginning) can crawl a mass of information with dedicated spiders (not only for the web, but also commercial databases, files...). Then, it structures all the content in thematics with links and proximity.
I personnaly tested it some years ago, feeding it with information websites and asking some articles "close to" another one. The efficiency was amazing because it was able to make the difference between close terms that have really different meaning depending on the context. Usually, search engines are wrong because they can't use the context.
I also set up some "agents" for recurrent searches (an agent is basically a search plus some training, letting Autonomy know what found document are close and not) and it was able to propose everyday a really good press review with nearly no wrong documents.
As a complement to Autonomy, I know a BI team that uses some other tools like Periclesto feed the searches with "relevant" content, basically thematics that are "appearing" in the group of documents and are close to some interests.
Such BI tools can already provide the kind of information cited, like a opinion movement against a company detected in the newsgroup or some websites. And IBM is certainly on the tracks to improve such tools with the techniques of their labs.
I hope these tools won't be limited to PR articles on the web and/or private use by big corporations, because it could only be another Echelon with all its bad consequences:
- bad use of public information
- paranoia feeded with wrong scares
- public/corp. power against the citizens
If tools like echelon could be used by everybody, it would have to let much more privacy to citizens and the public leaders would have to explain the investments. -
Yes, unless you're from another planet...
I was at the European Technology Forum Technology Summit event in London recently where Mike Lynch (Ph.D), CEO of Autonomy gave a keynote speech responding to this article. Essentially, Mike said that the article is "fundamentally wrong" and "anyone who asserts IT doesn't matter must be from another planet". A very interesting read... we've barely scratched the surface when it comes to IT. Autonomy develop software to manage unstructured information, something current IT is not very good at.
-
Yes, unless you're from another planet...
I was at the European Technology Forum Technology Summit event in London recently where Mike Lynch (Ph.D), CEO of Autonomy gave a keynote speech responding to this article. Essentially, Mike said that the article is "fundamentally wrong" and "anyone who asserts IT doesn't matter must be from another planet". A very interesting read... we've barely scratched the surface when it comes to IT. Autonomy develop software to manage unstructured information, something current IT is not very good at.
-
Re:Search on msdn.microsoft.com
Sounds like where Autonomy were 3 years ago to be honest. They had a demo where typical Office-style activities were monitored, linked and categorized automatically, meaning that when you went back to an email or similar, relevant local documents appeared as links on the side as a kind of semantic web, or you could search based on criteria as you mention, or you could get it to digest large docs and present you with a summary.
More impressive still was a CNN feed going into a speech recognition engine - as someone was talking, references to web and local documents came up on the subjects mentioned. Quite spooky. -
Re:How does the metadata get into the database?
I'll pass quickly over your dubious terminology and get straight to repudiating your conclusion:
There are a number of successful products that automatically infer ("extract") categorization information ("meta-data") from unstructured data and are certainly more than "marginally useful".
A structured information storage system for Linux is something to be welcomed rather than sneered at. It will immediately improve the accessibility and coherence of information now haphazardly stored in dozens of different semi-structured forms, and, later, in conjunction with the kind of automated tools referred to above, has the prospect of growing into a valuable information processing system. -
Re:How does the metadata get into the database?
this is not information that the computer can necessarily gather for itself
Not necessarily, but system's like Autonomy's automatic categorization search can do a pretty good job.
I see several posts here point out the difficulty of maintaining category information (erroneously referred to as metadata) manually so it seems clear that progress in automatic classification is needed to complement more sophisticated storage structures. -
Bayes with good interface
I used to beta this thing by this company called Autonomy which would sort and sift all your (and everyone elses) cruft to assemble a list of relevant links (to your stuff and others) in response to your activities.
IMO it did this in real-time, must have made for some impressive indices.
Maybe this is the answer, open-source Autonomy. I am a mere perlmonks acolyte so I will leave it up to the real brains to figure it out ;-) -
Re:AltaVista appliance for intranet searching?
You're missing by far the biggest intranet vendors. Verity is the king of this market, and have been since the mid-90s. They get a lot of mileage out of their OEM sales; it sounds simpler to a company if they hear that they "already have Verity" within Documentum or Cold Fusion or whatever.
It'll be very interesting to see what they do with the Inktomi purchase. (They bought the productized search before Yahoo snarfed up the external services.) Inktomi is IMHO the best intranet search engine right now. (I believe Verity is dropping the Inktomi name and is calling the tool Ultraseek, which goes back to Inktomi's acquisition of Intelliseek.) The purchase gave Verity yet another leg up with enterprise search, it'll be interesting to see if they leverage the technology or if they see this more as a marketing move.
Google is obviously a big player here too. Don't need to evangelize to the
/. crowd on that. However Gooogle still has a way to go in understanding how to tackle enterprise search.Autonomy is another big player in the enterprise, though I am less familiar with their tools.
Other interesting enterprise search vendors include FAST, Isys, and Divine/Northern Light (yes they're still around). Teoma/Ask Jeeves could get there if they productize their search tool. Lots of interesting approaches there but nobody who's quite moved up into the first tier.
Anyway, it's a messy space even with all the consolidation above. I have no idea whether Ovation will keep up their enterprise sales effort or not; I suppose it depends on how profitable that part of the business is. Guess we'll find out...
-
Solution
I don't know how much data they are actually talking about, but I can offer up a solution.
Some of you might disagree. I've run into a scalable piece of software which will interogate all their information sources irregardless of their storage format, index them, and still leave them all in their respective locations.
Autonomy Inc. has a product called DRE AXE which is also XML compliant. They have a pretty simple API to work with and have even seen it work on Java, PHP, and Perl. The query engine is extremely fast, and supports laymans terms. The engine supports both Boolean as well as natural language queries. Check them out, i've been administering their products for about 2 to 3 years now.
Ok, Ok, I'm giving them a plug, but hey their product works well. -
Tomcat with Autonomy
We use Tomcat/Apache/J2EE in a production environment here with Autonomy's Portal in a Box. We've found that it performs much better than a Win32 install with the same hardware and specs. Tomcat, however, is much harder to configure and get working with proprietary software than say New Atlanta's ServletExec, which I found is much easier to configure and install. But once it get's going, I think Tomcat is as rock solid a server engine as any other.
-
Manual registration isn't the only way to do this
I personally don't like having to fill in my details with websites, although I have with those that are any good and allow me to unsubscribe from mailing lists of adverts. But this is not the only way to retrieve marketing information about visitors to a site:
Companies such as Autonomy and Escape Velocity Technology have technology that does automatic collection of your online habits and can form a picture of you so that each time you visit a website powered by their products it can be adapted to your behaviour. There are no forms to fill in, it just works off exactly what you've done; so if you read articles on company profits (or losses as the case currently seems to be out there!) then you'll get similar articles presented to you more often and adverts might be for online stock brokers (or debt collectors ;) They can even get down to the order you do stuff and what times you do it. If you know what someone does you have a good chance that you can work out the demographic "bucket" they fit into and can use that for e-marketing.
The Internet is almost anonymous: you can be who you like, when you like it. Filling in forms with misinformation is just like creating a "new you", but can you break the habits of your real persona? -
Searching
Most CMSs actually don't really bother about searching via their internal datastores, mostly because of the problems you've raised above.
What you tend to get are two (not mutually exclusive) approaches:
- Spidering of content using a packaged solution from vendors such as Verity, Autonomy or Thunderstone
- Internal datastore searching of taxonomy - metadata about how the content is organised. This is the harder one to crack as an effective taxonomy takes a looooong time to get right for a half-way complex site.
Sidenote - you can get around those query string URLs with most serverside scripting environments (as well as that PHP tutorial, it's worth bearing in mind that evolt's site is built in a similar way using Cold Fusion)
-
This is where Autonomy startedWhat is now Autonomy, the knowledge management company, started about ten years ago when Mike Lynch's PhD research was sponsored by the police in the UK, to find ways to scan the mass of witness statements that are gathered in a major incident enquiry (often inconsistent, with varying content and terminolgy), and to automatically identify important features and cross-reference them.
From that original start, they then (allegedly) gained the interest of the intelligence services,and then the media companies and dot coms, to become the players they are today.
-
Ontologies: handmade vs. automatedFrom the CYC website:
CYC's knowledge base is built upon a core of over 1,000,000 hand-entered assertions (or "rules") designed to capture a large portion of what we normally consider consensus knowledge about the world.
As the interview with Google on /. yesterday brought out, one of the great challenges of the moment is how to take enormous quantites of easily available data, and store it for retrieval in ways that reflect an understanding of the real world. (One might try to quantify the "intelligence" of a database by the extent to which it can achieve this kind of data association / data reduction).Good ontologies are a big part of this -- identifying and distinguishing different contexts, associated with their likely possible properties.
The work CYC have done in finding good ways to represent such ontologies is important, but only goes so far -- in particular it seems to be essentially static. What impresses me more is some of the work that has been done elsewhere to automate the process of the discovery and maintenance of ontology -- extracting it dynamically from the associations revealed in a large pile of documents.
One example of a site which is an end user of such technology is the well known news portal moreover.com, powered mostly (I believe) by Autonomy
-
Natural Language Processing...
Well, the article speaks of a lot of things, mostly though, it's links to specialized search engines. It gives the impression that in order to really find what you are looking for, you should use a highly specialized search engine. I disagree a bit on that.
I know there are companies out there that has the technology to "put it all in one" so to speak. I have worked a little with Autonomy, and I gotta say, I am deeply impressed by what it does. They employ technology called Bayesian Inference (from Thomas Bayes). The technology has to do with "calculating the probabilistic relationship between multiple variables and determining the extent to which one variable impacts on another" - Sounds wild, eh? Well it it. Together with this, their core engine, called DRE (Dynamic Reasoning Engine), relies on the theory of Claude Shannon, which states that "the less frequently a unit of communication (for example a word or phrase) occurs, the more information it conveys".
The more input you give it, the more accurate it will be. Oh, and it's actually for all kinds of unstructured information - also e-mail.
I ramble. You should check it out.
Autonomy also makes Kenjin, which is a piece of software that you install that will understand what you are looking at, and help you search for similar stuff. Kinda kool. -
Natural Language Processing
You might want to start looking up some stuff about Autonomy. This is possibly the fastest growing tech company in the UK. They have a product that uses NLP to help with search solutions, agents, and knowledge management (to use the new-media-hype terms).
I have seen their products in action and they are *extremely* impressive. What might be of most interest to you is the way their algorithms work. Now I'm sure they won't release that info, but you may be able to glean enough info from their material to get on the right track. (Start here perhaps). Basically their technology is based on Bayesian algorithms (Bayes was an 18th century English cleric who came up with some cool ideas about NLP that couldn't be proven until recently when the computing power and information volume to make them work became available - how's *that* for far-sighted!) combined with Shannon's Information Theory. It is WAY powerful in practise!
I agree that it would be really cool to see some kind of automated moderation system based on this type of principle. I'd also *love* to see some Open Bayesian NLP work.
Good luck with the experiment.
"Give the anarchist a cigarette" -
Natural Language Processing
You might want to start looking up some stuff about Autonomy. This is possibly the fastest growing tech company in the UK. They have a product that uses NLP to help with search solutions, agents, and knowledge management (to use the new-media-hype terms).
I have seen their products in action and they are *extremely* impressive. What might be of most interest to you is the way their algorithms work. Now I'm sure they won't release that info, but you may be able to glean enough info from their material to get on the right track. (Start here perhaps). Basically their technology is based on Bayesian algorithms (Bayes was an 18th century English cleric who came up with some cool ideas about NLP that couldn't be proven until recently when the computing power and information volume to make them work became available - how's *that* for far-sighted!) combined with Shannon's Information Theory. It is WAY powerful in practise!
I agree that it would be really cool to see some kind of automated moderation system based on this type of principle. I'd also *love* to see some Open Bayesian NLP work.
Good luck with the experiment.
"Give the anarchist a cigarette" -
Re:Uhmm.. Is this even POSSIBLE?
Yes, this is possible, there is software out there to make it feasible, for example something like Autonomy. In fact when I was on a training with that company last week they freely admitted having sold their product to some unnamed British intelligence agency...
-
AutonomyKeyword matching searches, even with Alta Vista's context database, are clumsy and commoditised. There is simply no business value in the company considering what is now a non strategic asset (i.e. very hard to prevent a rival duplicating) as a key piece of intellectual property, when products such as Autonomy are using AI and Bayesian Inference to perform searches on large document sets at an accuracy Alta Vista can't touch.
Having said that, note that Alta Vista are keeping their actual database to themselves - this is the one real asset other than their brand which they possess. Taking these two together, we see a core competence (i.e. leveraging them provides a return disproportionate to effort in relation to the market sector), which is now the basis of their revenue plan.
-
Re:They aren't all bad.....I've never had a problem with recruiters like this - I can even phone them up, recite the URL of my CV and they won't stop calling me back...
He complains that they didn't accept his document format; I dare say the head hunter said to his buddies, "here's some dude who ignored our clear instructions on what formats we accept".
The state of the average recruiters database is such that, for the majority of positions, they will have hundreds of vague matches, and a few good ones - without computers to sift them (after all, this is what computers are good at) they wouldn't be able to do their jobs.
Of course, the recruitter with the best database searching technology is the one who'll get most of the commissions, so expect to see them set better, quickly. I wonder if any of them can afford to implement Autonomy?