Slashdot Mirror


Mining Unstructured Data

jscribner writes "Data these days tends to an unstructured form, be it text (like the web, email, or books), spoken word, or even in DB's with unique organization (and thus a discrete language). There's a new article on Unstructured Data in Think Research; it's an overview of the challenges, progress, and potential rewards in this area. I'm leaving on your doorstep because, to me, it's a good launching point for discussion of several interesting possibilities: /. as a minable DB of ideas, email identified by interpretation rather than keywords, emotive XML, etc."

6 of 105 comments (clear)

  1. /. as a Turing Test by bravehamster · · Score: 5, Funny
    email identified by interpretation rather than keywords


    A Machine will be considered truly intelligent when it can translate all emails on slashdot into a usable form. Since spammers are some of the most persistent and aggressive users and developers of technology, I expect we'll have real AI telling us how to enlarge our penises by next Thursday.

    --
    ---- El diablo esta en mis pantalones! Mire, mire!
  2. "Slashdot as a minable database of ideas..." by theonomist · · Score: 4, Funny

    Oooookay.

    Sir? Please step away from the bong.

    I just spent an ejoyable half hour or so reading Business 2.0's "minable database" of 101 Dumbest Moments in Business, and then I had a look at their even-more-hilarious 100 Dumbest moments in e-Business. This article really does have that weird flavor of megalomaniacal Internet-hype gibberish that we all came to know so well during the boom years. In a way, it's a pleasant little nostalgia trip to see the same old idiocy presented with the same old mindless confidence, but in another way it's just depressing.

    Reality Check: Slashdot is a BBS for bored IT workers taking a break while installing nine hundred copies of Word on nine hundred 266 MHz beige boxes at the local credit union. It is not a minable database of ideas (or at least not of ideas worth mining). At its best, it's an undergraduate bull session.

    What the hell are you people smoking?

    --
    "Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive" -- hey, that's me!
  3. Slashdot by rbgaynor · · Score: 4, Funny

    Interesting, my mining of hot ideas on Slashdot has determind that a Beowolf Cluster of First Posts is the next big thing...

    --
    "Good things don't end with eum, they end with mania or teria." - H. Simpson
  4. Forward this to the Director of IT, stat! by johncheng · · Score: 4, Funny

    This article will have great importance to our director of IT, since the way our company stores data seems to completely unstructured.

  5. This Is Like Mining Money by Anonymous Coward · · Score: 5, Funny

    "email identified by interpretation rather than keywords"

    Report: The attached email messages indicate a successful business plan. This simple way to make money fast by selling pamphlets is interpreted as being good: it has been confirmed by many quotes within the email, by repetition in many similar emails, by the suggested calculation of potential return.

    Opportunity: There is an unfilled business opportunity which is confirmed by the lack of existing businesses which use this plan. Searches of local and national databases have not found any businesses which are using this method.

    Suggestion: Give me a dollar so I can start a business.

  6. XML won't make it by mangu · · Score: 4, Insightful
    To encode information in XML is as much work as doing it in SQL or any other language. What is needed is artificial intelligence, to take any data source, be it a picture, text, music, or whatever, and classify it. Some examples of what I have wanted for:

    - show a text and find other texts about the same subject.

    - hum a tune and tell find an mp3 of the same music.

    - show a picture and find other pictures of the same girl.

    - better, show a picture of a girl's face and tell your search engine to find nude pictures of the same girl...


    Until those simple tasks can be done easily, we will be stuck with the 13500 links one gets when searching for "christina ricci nude" in Google.