Slashdot Mirror


Words That Speak a Thousand Pictures

venolius writes: "The New York Times (free registration required) has an article on TextArc (created by W.Bradford Paley), a site that "aids in the discovery of patterns and and concepts in arbitrary text" (from the detailed overview at TextArc). The site serves an applet that performs the task (texts on which analysis is available include Alice in Wonderland, Hamlet, and thousands of others -made available by Project Gutenberg-). The NYTimes article reports that Paley found that "Dracula", which relies on a strong storyline had a few keywords clustered hotly at the center, and that the metaphoric "Frankenstein" generated a circle of 50 words of modest intensity that faded towards the edges. "Portrait of the Artist as a Young Man" with evenly distributed key words produces tight and round lines and "Alice in Wonderland" produces loopier lines. Check it out! (the applet was tested on better hardware, but I did well enough with 98/IE6/550MHz/64MB)"

3 of 102 comments (clear)

  1. Other tools for exploring the Semantic Web... by tessellation · · Score: 4, Interesting

    ...the one we already have, that is:

    map connections between two words, concepts, or famous names

    see a word's rhymes, synonyms, definitions

    and I leave the rest to you.

  2. Gutenberg by proxybyproxy · · Score: 5, Interesting

    Once again Project Gutenberg shows its beautiful face. If you haven't heard about it before, then read a Wired feature here. Michael Hart started the project years ago and he wants to digitize anything which is out of copyright. The uses are infinite (think of the blind who can fead texts to tactile printers, for example), which this story also shows.

    Anyway, Hart is a big supporter of sensible copyrights (read the feature) and if you can spare the time, help him by digitizing your favourite book.

    --

    Hurra for Knark!
  3. Re:Please... by Beliskner · · Score: 5, Funny
    it sux that something opens a max window over your desktop without asking permission, chews up cpu without asking permission, and then fscks up as soon as you click on it and has to be killed by hand.
    Better to do a complicated slow query on Google using many keywords, hogging all those read-locks, attacking multiple Google linux machines simultaneously as inverted file dictionary lookups are performed, forcing them to swap in pages to look up *your* damn query (15ms DoS attack), and then a machine having to perform a sort algorithm on all those results. As if this wasn't enough, you hog the CPU of the machine that parses this into HTML so that your browser can see it, and then a whole fork() has to be done for you OR you steal a thread out of the Apache thread pool to serve you, clogging the routers because of queued packets because of your cheap ISP, your slow xDSL connection making Google's Apache *wait* for your ACKs, taking memory to hold such a massive TCP sliding window. Selfish selfish you, doing a DoS attack on Google whenever you perform a search.

    Point: Somehow if it uses *your* CPU it's different, but when Google's machines do all the work it's somehow OK. Next you'll be complaining about websites doing a DoS by forcing your browser to use CPU by rendering HTML and your TCP stack having to store a sliding window whenever you view a webpage. This selfish attitude is why all filesharing software must be redesigned to NOT allow anyone to kick uploaders. If you want to kick uploaders then shut down the filesharing App, but then you lose download karma.

    My personal opinion: Due to heavy-client structure of the majority of machines, this applet uses the correct CPU (yours). Reduced ad revenues means that Internet companies can no longer afford massive server farms unless they require Cydoor installed on all client machines, alternatively they can decrease their equipment costs by delegating processing to client's (suerfers' browser's) CPUs. *nix systems, especially if apps are in a Java sandbox are heavily protected against most attacks, including process DoS attacks via kill -4, kill -9 and a scheduler+VM designed to stand up under heavy loads. Poor Microshaft Win9x people have to do Ctrl-Alt-Del which halts all processes while they look at the sceduler's contents, and it takes 30 seconds of a sucesful DoS for the offending process to be recognised as "unresponsive and kill -9'able" otherwise it is merely "kill -4'able". If you don't want them to use your CPU, then don't visit their website, disable JVM, disable HTML parsing in the browser (this takes CPU), read the raw HTML yourself, disable TCP stack, write packets by hand to minimise CPU usage or even bypass the CPU completely by setting the modem's line inputs yourself with a logic probe, then probe for response using oscilloscope and logic probe. Do PPP-IP-TCP by hand. That won't use your CPU.

    Pop quiz hotshot, this is an Informative flamebait, what mod do you do, what mod do you do?

    --
    A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?