Google Wins Rights to Aussie Algorithm
rcbutcher writes to tell us the Sydney Morning Herald is reporting that Google has just acquired the rights to a brand new text search algorithm invented by a University of NSW student. From the article: "Orion works as an add-on to existing search engines to improve the relevance of search and won praise from Microsoft founder Bill Gates last year. [...] Orion finds pages where the content is about a topic strongly related to the key word. It then returns a section of the page, and lists other topics related to the key word so the user can pick the most relevant."
The Sydney Morning Herald struggles with computer-related articles. The range of topics they cover is interesting. Sometimes they even have articles about Linux kernel news. Their accuracy usually isn't very good, though. I've reported a couple of errors to them in the past month or so. In one article, they got Electronic Frontiers Australia mixed up with Electronic Frontier Foundation, but still used the acronym for the other organisation.
I'm curious about whether these inaccuracies are limited to science/computers. It's entirely possible that the media sources we trust to be accurate are actually riddled with errors.
While Mr Allon is the key person behind Orion, the university retains ownership of the intellectual property as it was developed within the university's research facilities.
Bleh, sometimes I think I shouldn't leave my house for fear of coming up with an idea where someone else can lay claim to it. It could be that he needed the computational resources of the university to develop the algorithm, but it's easily imaginable that the university could be laying claim to it when he was working without any real assistance.
I know that there are a number of issues around this (where do you draw the line?), but still - in general writing algorithms is a creative act, so they should belong to the creator(s), if it is even possible to own an algorithm.
Do a Google Scholar search for publications in CS/EE, and you get... nothing.
His own web page is bare, with no details.
A Science Daily article from September 2005 (yeah, over 6 months ago) mentions this "algorithm", but scan details.
I highly doubt the novelty/effectiveness of this "algorithm" if it has been patented before being published in a peer-reviewed journal.
I read a book on the Google story a while back. What I remember is that when they came up with the algorithm, they worked with Stanford to pitch the algorithm to Altavista, Yahoo, etc. They wanted about $1 million for it but nobody wanted it. The Google guys just wanted money so they could scale up their experiment with more computers and storage but none of the big guys could see any money in search engines. Then at the prodding of the Stanford folks, they found a few angel investors and build up their company and the rest is history. So I guess the Google guys don't want to miss any opporunity and probably have a soft spot for these college students for when they were in the same place.
I like his initiative though. I wonder if he looked around at the current marketplace and thought "hmmm... so I gotta few years to research something... Google's looking pretty hot right now... why not build something I can sell them the end of it?". If he did, he's smarter than the average bear.
Actually I did a similar thing during my undergraduate degree in the early-mid 90s. I designed a very early back-end/database for a generic web-based online store. About 2 weeks into my project I got a call from a big record company (who apparently had heard about my work) and they bought it, despite it being mainly on paper at that point. I won't say who it was, I ended up working for them for a short time after I graduated, and as far as I'm aware, their site still uses the core of my code.
Since it sounds like he was a student immediately before, it sounds like a step up in his career, and the only possibly evil thing I ended up seeing here was that Google is taking on a tech with Microsoft praise.
Beware: In C++, your friends can see your privates!
... at least, not when they have terabytes of data to search through. While Boyer-Moore is an asymptotically optimal algorithm for non-indexed string matching, Google (and everybody else who wants to perform multiple searches against the same data set) uses indexed matching algorithms.
With indexed matching algorithms, you can search for a string of length M within a string of length N in M + log(N) steps -- far faster than B-M's M + N/M steps -- and you can even search for matches with mismatches (e.g., locations where the strings match at 50% of their positions) almost as fast as B-M (asymptotically B-M finds exact matches log(N)*log(M) times as fast as matches-with-mismatches can be found).
Tarsnap: Online backups for the truly paranoid
You do realize that:
(Oops, got carried away there.) For me, I happen to enjoy Cooper's Stout. Basically, from the sounds of it, Fosters is about as authentic as Outback Steakhouse.
--JoeProgram Intellivision!