Slashdot Mirror


Linguistics Meets Linux: A Review of Morphix-NLP

Emre Sevinc writes "Zhang Le, a Chinese scientist working on Natural Language Processing has decided to pack the most important language analysis and processing applications into a single bootable CD: Morphix-NLP. More than 640 MB of NLP specific software is included and there's still a lot of place on the CD which uses a compressed filesystem for bringing us the best of both worlds."

8 of 186 comments (clear)

  1. that's pretty cool by homerjs42 · · Score: 3, Insightful
    This is a pretty cool thing. It seems like the kind of thing that would be of great use to anthropologists or others translating from a language that is more or less unknown. By unknown, I mean not used commonly outside of its people group, and probably unwritten.
    Neat.

    --dw

  2. Re:Good Chinese Compression by log2.0 · · Score: 2, Insightful

    I would say that westeners can not pronounce simple Chinese.

    English is the only language I know but I studied Mandarin chinese for a few years.

    There are all sorts of things in there that we have a lot of trouble pronouncing.

    --
    Can your karma go above being Excellent?
  3. Natural languages useful for spam filters? by joelparker · · Score: 2, Insightful
    Can anyone here comment on if/how
    any of these natural language tools
    can be helpful for spam filtering?

    Cheers, Joel

  4. Re:Chomsky and stuff by idlemachine · · Score: 2, Insightful

    I both agree and disagree: life *is* that complicated, we just haven't yet come up with workable abstractions for a lot of things that allow us to handle them in the simplified manner you're asking for.

    What you're seeing here is the process by which that happens. Chomsky especially is someone whom I don't consider to want to "make [things] out" to be more complicated than they are; on the contrary, he seems to be more about wanting to understand the *true* process that is at work, not the pre-accepted social fiction that we might currently use as an explanation.

  5. Random musings from an ex-linguist. by Charles+Dodgeson · · Score: 4, Insightful
    I'm a PhD drop-out in linguistics, and happen to know precisely what a head-lexicalized context-free grammer is. (And, no, reading Chomsky is not the way to find out what it is). Below are some random musings on the geekiness of linguists.

    Linguists have always been geeky. Don't forget that Larry Wall is a linguist first.

    The only computer class I ever took was in 1983 called "Computer tools for natural language analysis". It was an introductory Unix course. We learned grep, awk, sed as well as tools like vi, Mail, and rogue. And a tiny little bit of C. But since then I've taught C at the graduate level.

    Linguistics is all about the reprensentation and manipulation of information. But instead of it being about languages we design for particular purposes, it is about the language system that we use naturally.

    Suppose you have a few thousand languages that you know were written with the same tools (like lex and yacc, but not lex and yacc), but you have no access to those tools. Suppose you are trying to figure out what those tools are from examining the languages (not the compilers) that have been specified using those tools. That is what theoretical linguistics is trying to do. We know that the specification of English and the specification of Dyirbal and every other human language out there are somehow "written" with the same tools. It's pretty need stuff.

    Linguists were early adopters of TeX, have had a Unix affinity for a while, and as people who are interested in how information is internally represented and manipulated, like reading the source.

    I remember once nagging the sys admins to always make sure that there is a man page for anything added to /usr/bin or /usr/local/bin. The next day, they asked me to look at the manpage for something to see if it met with my approval. The DESCRIPTION was the C source. I was happy to say that it did, indeed, meet with my approval.

    At one point, a well known professor (Geoffrey Pullum) had written a little essay for a newsletter on the "grammer of Unix" using linguistic style analyses of the shell. Naturally several of us feigned outrage at his confusion of "Unix" with the shell. Another linguist (Bill Poser), went so far as to write a shell which was verb (command) final, and post-positional. That is instead of saying
    cat foo bar > bang
    you would say
    foo bar bang > cat
    That is, the arguments preceed the command, and the redirect symbols go after the filename they redirect to or from. Now for various reasons, I had root access on a machine that Pullum used. So I changed his shell to this command final one. He actually caught on remarkably quickly. And after a quick
    /bin/sh chsh
    he was ready to concede the point.

    For me, there is no surprise that linguists, and particularly computational linguists, are OSS enthusiasts. But that is enough of my random musings for now.

    --
    Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
  6. Re:Good Chinese Compression by Anonymous Coward · · Score: 1, Insightful

    there are something like 80 phones of linguistic merit capable of being produced by humans. english has like 40, i think.

    and any linguist will tell you, it's impossible to pronounce something wrong...linguistics is a descriptive, positive science as opposed to a normative, prescriptive science. NO ONE speaks wrong, unless they think they do, i.e. a speech error

    i didn't try very hard in ling 101, it was so easy....

  7. Re:Chomsky and stuff by WFFS · · Score: 1, Insightful

    Um, you forgot Semantics (the meaning of language), one of the more currently important topics.

    I'm doing my BSc, majored in maths and CS, and currently doing honours in CS. However, my project/thesis is on Language Technology, based squarely around semantics (for verbs to be precise).

    Now, my point is basically agreeing with the above poster. I can't really go in depth about my project with the average Joe/Jo, because it is just too complicated. There is too much jargon and linguistic basics that would need to be covered first, and well, that takes up a whole chapter of my thesis, by which time Joe/Jo would've gone back to their game of Quake.

  8. Re:Chomsky and stuff by Toddlerbob · · Score: 1, Insightful
    As someone who likes Chomsky's work (and gets modded down for mentioning it - so I'm glad to see it didn't happen with people this time, though maybe to me, we'll see) and has someone who's studied cognitive science, I agree with this poster and also the one poster two steps up.

    That is, yes, things are that complicated. In fact that's a point that Chomsky himself makes, not only in reference to language, but also in reference to economics and sociology. He often says this in reference to economists and sociologists who claim to understand how human societies work, but I could also imagine him saying this in the context of human psychology / linguistics.