Slashdot Mirror


ODU To Develop Deep Web Search Engine

jvsanford writes "Three Old Dominion University (ODU) computer science professors plan to develop a 'deep' web search engine that searches digital libraries and collections that expose their metadata via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). In addition, they are also planning to develop an Apache module, mod_oai, that will increase the number of people who can export their metadata and resources via OAI-PMH."

9 comments

  1. Hope it works, unlike Google by Anonymous Coward · · Score: 0

    Hope it does accurate phrase searching. Google keeps coming up with pages that don't contain the search phrase. Rather frustrating.

    1. Re:Hope it works, unlike Google by Anonymous Coward · · Score: 0

      I notice that you don't give an example. Also, did you check wayback to make sure that your phrase wasn't just removed from the page? And did you try view source to see if the phrase was hidden in some way? or are you just another whiner? hmm?

  2. Two examples: by Anonymous Coward · · Score: 0
    Two examples:


    "AB RAIN" (a brand name of a product. 3 of the results are irrelevant/erroneous)
    "to be or not to be" (2 of the top 10 results do not contain actual phrase).

    These are just a couple of examples. Google is has perfect results for single words, but when it comes to phrases it has problems with the logic of making sure that the pages in the result set actually contain the phrase being asked for.

    1. Re:Two examples: by Anonymous Coward · · Score: 0

      Error between the chair and the keyboard. Put quotes around your search. I count all 10 top hits contain the phrase.

  3. Hacking National Security by bluethundr · · Score: 4, Interesting

    A couple of years ago at the last "HOPE" conference (this year's is happening July 9-11, this summer) was the first time I heard of this idea of the "deep web".

    The topic was something called "Hacking National Security" in wchich the speaker, Robert Steele, first brought up this concept and mentioned what he described as a "deep web search engine" called Copernic. However, I've found that product (there is a free variant) is basically queries a list of different search engines. This is not what I would consider a "deep web search" now that I have learned a little more about the term. But that was the first I'd heard of it.

    Robert Steele can be forgiven for being a bit technically naive. Because his specialty is National Security and not technology. But he had a lot to say that was of salient interest to technology minded folks. Why else would he have had a panel discussion at a hacker conference?

    What I learned from him is that search engines like google and others only are able to skim roughly 5% of the total content of the web. Everything underneath that 5% is the "Deep Web". This is what he claimed the global terror networks are using to communicate with each other. And, most alarmingly, that the NSA - Amerca's Information Processing branch of the government was COMPLETELY ill equiped, even ignorant of terror groups freely trafficking their plans on the web. Talk about our most "advanced" information processing governmental body! Note the lack of a CNAME entry in their DNS record! Don't forget the "www" now! yeesh! At any rate I read an interesting book about them way back in the 80s called The Puzzle Palace. But I'm sure it's way dated by now. I read it way back in 87. Did you know that they are roughly 3 times the size and girth of the CIA...and yet hardly any of the lay populace seems to have heard of them! I once dated a "know it all" (how do you ever learn anything if you already "know it all"?) bad-poetry, arty farty girlfriend who claimed that I was "making the whole thing up" when I tried explaining to her about the NSA! May I say again, "yeesh"? Literally COULD NOT convince her otherwise...I digress...

    Now hold on a minute here! Just how dated would you suppose that book to have been? One of Robert Steele's pet peeves was the extreme datedness of NSA tecnology. Being a government agency (FLAGSHIP of intelligence agencies!) a good hunk of their computer technology dated back to the 70s. This was still the case as of 2002, mind you, and if I understood him correctly.

    Now, another of his compaints was the lack of native speakers hired by the agency. That is, instead of hiring a native Pashto speaker, they will instead almost unerringly hire the "blond haired, blue eyed, cocky midwestern jock" (his words not mine) with a degree from an Ivy League school in linguistics who has a generalists knowledge. What's wrong with a young PHD in linguistics tending to these matters? According to Mr Steele that even the best generalists knowledge will not catch the flavor or nuance of language spoken on the terror sites. What's lost in the translation? Not much...if you don't count our National Security.

    Also according to him, the "terrorist community" (I know that's an over-used term in this day and age...please try to bear with me, here) knows this and thrives doing so.

    One major point of contention he had wa

    --
    Quod scripsi, scripsi.
  4. Old Dominion University by Marxist+Hacker+42 · · Score: 1

    Anybody else's first thought reading this article description "Why do the shape shifters want to deep link into our web?"

    --
    SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
    1. Re:Old Dominion University by Anonymous Coward · · Score: 0

      Actually, my first thought was "Why are they attempting this when they can't even get a train built? Furthermore, my experience with ODU people has led me to believe that those who were not arrogant and unreasonable are the minority. Good luck ODU; i'll be surprised if you can cobble together anything better than wget | grep.

  5. Is ODU reinventing the wheel? by bigsteve@dstc · · Score: 4, Interesting
    DSTC has already developed a commercial strength product that provides most of this functionality, and more. The MetaSuite product line includes:
    • A metadata repository and search engine with a tailorable web-based user interface, and OAI repository functionality.
    • User query refinement using a GuideBeam plugin.
    • An OAI Harvester for once-off and periodic fetching of metadata from other OAI repositories.
    • A Gatherer that extracts metadata from web-pages.
    • A Metadata Editor for creating validated metadata records in the repository and/or adding it to web pages.
    • A Metadata Schema compiler for defining metadata schemas and the associated validator plugins. Support for DC, AGLS, ANZLIC / ANZMETA metadata schemas is standard.
    • An architecture that supports plugins for custom metadata access control, workflows, record formats, search result ranking, display rendering and so on.
    The only significant thing missing from MetaSuite at the moment is free-text searching of linked documents whose metadata has been entered into the repository.

    For more information, please refer to the MetaSuite product web pages. For example customer sites, try the Australian Virtual Engineering Library, MIRMgate and Australian Digital Thesis. [None of these sites have so far chosen to enable OAI repository functionality, but it literally would be a two minute job to do this.]

    Disclaimer: I work for DSTC.

  6. 8th p0st!!! by NorthDude · · Score: 0, Offtopic

    How's that as my first try at trolling? :-P

    --


    I'd rather be sailing...