FBI Conducts Feasibility Study on Project Sentinel
leave-no-trace writes "CNN reports that "FBI officials hope to award a contract by the year's end for a complex new software program (dubbed Sentinel) to replace a failed project that was canceled this year at a cost of more than $100 million to taxpayers." The system is supposed to include search capabilities, protocols for processing and handling FBI reports, security issues and a new system for records management. FBI Director Robert Mueller told lawmakers he is unable yet to place a price tag on the Sentinel project."
Hmmm...I don't think federal executive positions pay as much as you think they do. http://www.opm.gov/oca/05tables/html/ex.asp
If you don't know where you are going, you will wind up somewhere else.
For the hardware setup (scale) and general search solution, Google is very good. However, it is not for every problem.
Google does not have near the contextual capabilities of some (custom-fitted) search engines. At some point, you need automation and a level of reliability. You can't have a person looking at everything. And repeated searching, which we take for granted, is often necessary on the same dataset to garner sufficient results. Who says when we have found the right information?
Google does not provide complex taxonomy or a feedback loop mechanism (which can be very complicated - often patented or proprietary).
In the original PageRank thesis, it was made clear that context was entirely up to the user. When dealing with records (i.e., highly redundant data that must be cross-referenced extensively), Google falls flat.
Let me greatly over-simplify. Consider, "Joe Smith civilian" and "Joe Smith terrorist". Google will not distinguish the two Smith's. It will only distinguish the phrase in relation to the index. So - even if we have a link between Smith the terrorist and smith the civilian, we can still mix them up (unless we mark everything explicity). We need context (not just words in the same document, sentence, etc.), and as our search pattern hones in on matches (repeated, refined searching), we need better classification or we go in circles.