Creating a Computational Linguistics College Degree?
$random_var asks: "I am an undergraduate student currently studying Bioengineering. However, I am growing more and more interested in programming and linguistics, which leads me to think that I should define my own major, Computational Linguistics [mit.edu], which google defines as 'a field concerned with the processing of natural language by computers.' When I present my proposal for this degree to the school's advising staff, I would like to have a complete list of all of the topics this major should cover. Having only little experience with computer science and engineering, I'm not sure what parts of that field I should include. Beyond the basic lower-division courses, what specific fields of computer science do you think should be emphasized in a practical undergraduate study of Computational Linguistics?"
Computational Linguistics, or Natural Language Processing (NLP) as it was called for me, is one of the many areas that traditional Computer Science is exploring, in addition to things like biology (bioinformatics), etc.
I'd say the first two years in the major need to be very similar to the first two years in the Computer Science curriculum and last two in the humanities/linguistic area. The reason I say this is because a lot of the math, basic computer science, etc that is needed in the field will be the same as the core Comp Sci/engineering classes. However, the first two years in the humanities are very generic, and you specialize a lot later. So build up the core of Comp Sci and linguistical study early.
Then, get deeper. A lot of the NLP work is being done in Maching Learning/Data Mining classes. Make sure you take those, we had a whole class called Textual Data Mining at my university. Take algorithm design and some of the common advanced Comp Sci classes too, since a lot of the techniques are very advanced, being developed, cutting edge and will require research. Take advanced statistics classes too, much of the field is built on statistics.
NLP is a very interesting area, but I don't know if it deserves its own major yet. I would advise majoring in comp sci, with a concentration in Machine Learning/Data Mining through your technical electives. And a minor, or perhaps second major in Linguistics. I'd say minor because then you could take only the classes relevant to the field, instead of all the other stuff related to a humanities major that you may not want.
Anyway, that's my take. I did a bunch of NLP research, even getting work published as an undergrad, and I was a Computer Engineering major. The field is as such that it's so new and emerging that not much formal linguistics study is required right now, if you are a native english speaker you are probably good. But, it doesn't hurt to get a more formal background in it, that's why I suggested the minor.
-"Those who fought today will die tommorow."-
I concur. In fact, I did what you are trying to do, complete with Engineering Open House project that conversed, and stored info in a semantic network. Then I had wonderful interviews with various NLP heavyweights (of the era). When one gentleman said, "I'm going to do everything I can to see you get an offer", I realize in retrospect that he meant, "...since you only have a Bachelor's Degree." CS, lots of graph theory and algorithms. Take a few linguistics courses on the side. What he said. Then do this in grad school.
It involves taking a canon of text (for example, the complete works of shakespeare, or every example of written english you can get your hands on, or every example of transcripts of spoken english you can get your hands on) and subjecting them to a statistical analysis of what chucks (that is, words, phrases, what have you) are likely to follow which other chunks.
while the outputs tend to make little sense, it is "interesting" to see what kinds of "statistically probably" examples of language a computer can make based on the training (input) you've given it