Slashdot Mirror


Mining Unstructured Data

jscribner writes "Data these days tends to an unstructured form, be it text (like the web, email, or books), spoken word, or even in DB's with unique organization (and thus a discrete language). There's a new article on Unstructured Data in Think Research; it's an overview of the challenges, progress, and potential rewards in this area. I'm leaving on your doorstep because, to me, it's a good launching point for discussion of several interesting possibilities: /. as a minable DB of ideas, email identified by interpretation rather than keywords, emotive XML, etc."

1 of 105 comments (clear)

  1. XML won't make it by mangu · · Score: 4, Insightful
    To encode information in XML is as much work as doing it in SQL or any other language. What is needed is artificial intelligence, to take any data source, be it a picture, text, music, or whatever, and classify it. Some examples of what I have wanted for:

    - show a text and find other texts about the same subject.

    - hum a tune and tell find an mp3 of the same music.

    - show a picture and find other pictures of the same girl.

    - better, show a picture of a girl's face and tell your search engine to find nude pictures of the same girl...


    Until those simple tasks can be done easily, we will be stuck with the 13500 links one gets when searching for "christina ricci nude" in Google.