Just read the QuakeAID article on Wikipedia (and look at the edit history from January 2005) to get a pretty good idea of what QuackAID and the people behind it are really about.
Folks have posted Mathematica and other code here, which, while very elegant, treats this as a math problem. However, the problem can be viewed as a simple text processing task and solved by standard Unix tools, pretty much on the command line. The first step is to get the file ee710.txt from Project Gutenberg, e.g. here.
Then it's a simple matter of using an AWK or Python script to generate ten-digit substrings and pipe them into factor.
Instead of/in addition to posting about the error here, why not send off a note to the Times to let them know about the important flaw in their coverage of this story?
Maybe that's why I bank with a credit union. I prefer to use the extra money I save each month off bank fees and increased interest in my savings accounts to brew my own damn coffee.
I did some research on banks and credit unions where I live (New York metro) and concluded that Washington Mutual and Greenpoint Bank offered the best value to a regular guy like myself. Almost every bank, except for the biggest few, beat my employer's credit union hands down. Washington Mutual and Greenpoint have some very nice interest-bearing checking accounts. My local credit union doesn't pay any interest on checking, lousy rates on savings, and though their fees are low, this is offset by the annual membership fee. This trend is only going to get worse, with more and more credit unions being consolidated and merged.
Unless you're lucky enough to have a well managed, profitable credit union, your best bet is to choose a small or medium bank. Forget the big players, some of which are already close to their maximal size (controlling close to 10% of all deposits with FDIC-insured banks), since they tend to charge high fees to ordinary customers.
You would be correct if the purpose of the test was to distinguish excellent writers from merely very good writers in a creative writing course. However, the issue is quite different. The developers of the grading software know quite well that automatic grading cannot detect mixed metaphors, clicheed expressions, etc. That's why automatic grading is typically used in grades 6 through 10 (it also doesn't offer enough guidance for complete beginners) and only in "low stakes" situations.
Let me repeat that for you: automatic essay grading won't (ever, most likely) be used to score creative, literary works. But it can certainly help advanced beginners, who are struggling with subject-verb agreement (questions like whether to write "the purpose of companies are" or "the purpose of comapnies is"), proper spelling, too much repetition, etc. Simple, trivial stuff for professional writers, but important stuff to point out to beginners.
any truly excellent writer is likely going to do badly
Not really. Because what students are writing are five-part high school essays, not highly creative works of fiction. Simple stuff, like thesis, three supporting points, conclusion. You know, for kids. An excellent writer would realize what's expected of her and write accordingly.
Writing is not mathematics.
Well, you can also say "X is not mathematics" for almost all values of X. That doesn't mean that X doesn't have commonly accepted structures and regularities. High school essays form a cohesive text genre in terms of their structure. So do cooking recipes, job ads, you name it. Sure, kids are graded on how well they can conform to the expectations, but that is true whether or not computers do the grading.
Is this a communist system where no dissent is allowed?
Minor slip of the pen: you don't mean communist, you mean totalitarian. At least in theory, the fact that the people own and control the means of production has nothing to do with whether dissent is or isn't allowed. (Though in reality, all attempts at communism developed quickly into totalitarian regimes.)
The first one or two years in a North American (US/Canadian) graduate program are often not very different from the last two years in an undergraduate program: lots of required courses, not much research. I'm not saying that's bad: it can be good if you don't know what you want to do exactly, if you're entering a new field and need to catch up quickly on the basics, etc. North American graduate programs are therefore often longer than elsewhere. If you know exactly what you want to do and don't want to spend much time taking classes, look into the top schools in the UK: Edinburgh, Cambridge, Sheffield, etc. Most UK graduate programs (they're called post-graduate programs there) are focused much more exclusively on a research topic, right from the beginning (as I said, that can be good or bad). If you're seriously considering going there, look for studentships on jobs.ac.uk or other pertinent message boards. A studentship will typically provide you with three years of funding, which is considered sufficient for finishing a PhD in the UK (don't know if it actually is sufficient), compared with the nominal five years in North America.
If you want a "natural" language for computers then it would have to be necessarily of Chomsky-0 type.
Whoa, why would that have to be the case? People have made all sorts of arguments for some natural languages being context-free, regular, star-free, etc. In either case you can have very expressive subsets that are context-free. If natural languages could only be expressed by Type-0 grammars, then we would have a real problem explaining how humans process language.
it's well known that the grammar for all human languages follows the same basic rules (Chomsky's hypothesis)
It doesn't matter whether it's well known or not -- it's a hypothesis. (And not a particularly concrete one at that. If you don't say explicitly what the common structure of all languages is, it's pretty much devoid of content. ) But suppose you have a specific hypothesis that says something about all human languages. Then it makes sense to test it against as much data as you can get your hands on, and that includes languages that are about to become extinct.
This has nothing to do with the Linux kernel per se.
Still, the file system hierarchy is basically fine the way it is defined in the Filesystem Hierarchy Standard, LSB, etc. FHS has been around for a long time, at least eight years, as far as I can recall.
If I compile software that isn't already part of a distro myself, I tend to configure those packages with --prefix=/usr/local/stow and then use stow to install symlinks under/usr/local. That's pretty close to what you're suggesting, no?
Yeah, right. Singing synthesis is about as old as speech synthesis itself. Here's an example from 1958 (item 11 on the linked page). By 1961 singing synthesis had become fairly mainstream -- witness the singing HAL in "2001".
So they're claiming they own the copyright on errno.h. This is insane. Even if there are substantial similarities between Linux's various errno.h-s and SCO's version, how many ways are there to implement errno.h? It's a bunch of friggin' macro definitions with more or less standard names and more or less standard values. Someone correct me if I'm wrong, but I thought one could only copyright original works, but what's original about a bunch of #define-s?
You need to consider what you primarily want to do with your system. Are you taking a photography class? Then you probably need spot metering, which you don't get in all of the cheapest bodies. Do you want to do stop-action shots? Then you need fast shutter speeds and fast auto focus and/or predictive auto focus. Do you want to do close-up work? Architecture? Portraits? That determines your choice of lenses. Without knowing any further details, I'd say get a Nikon N65 (around $180, lacks spot metering) or N80/F80 body (around $350, has spot metering, better autofocus) and a Nikon 50mm f/1.8 lens (around $90). That's about as cheap as you can get for decent to excellent gear (the 50mm lens is a keeper). Get a good book and lots of film. Buy from a dealer that lets you try the equipment or return it if it doesn't work for you.
You'll get a lot of advice on the Canon vs. Nikon debate. It's like Perl vs. Python, csh vs. sh, C++ vs. Java (meaning, there is a clear answer, except that it's different for everyone). For me it came down to this: which system is it easier to borrow lenses for? Where I live the answer is Nikon, but your milage may vary. A Nikon 80-200mm zoom lens is a heavy and very expensive piece of glass that I might need once in a while for a weekend, but I could never justify the expense of buying one for myself. So I rent. And for me the only option in this department is Nikon. (But then again, you could just as easily rent a Nikon body together with the lens.) It was almost as simple as that.
Is he just trying to come up with an impressive looking formula...?
It's a linear combination of weighted attributes. How unimpressive is that? At least they should show us a list of games together with their attributes and sales rank. Given that information, we could do a least-squares fit (linear or nonlinear) ourselves, and, more importantly, evaluate the goodness of fit.
Experts hope this will solve a predicted IP address shortage as more devices are created to use the Internet.
This falls into the general category "Death of Internet Predicted". The internet is not running out of IPv4 addresses at the rate predicted in the early '90s, for a number of reasons, including NAT (whether you like it or hate it) and the simple fact that not everyone who wants to browse the web needs a publicly routable address.
Much better reasons for adopting IPv6 is that autoconfiguration is to a large degree built into the protocol (including its associated ICMP messages) and doesn't have to be done by a separate mechanism like DHCP. Also, IPv6 has a fixed length, small packet header, which should make it easier to do all sorts of routing tasks.
If you're running a Linux or BSD kernel, check out one of the many 6to4 tunnel brokers to get onto the 6bone or your own friendly neighborhood IPv6 backbone.
I don't mean to troll, but I hope it's not too late to put an end to the unfortunate term "Bayesian spam filtering". This is perhaps the worst abuse of the adjective "Bayesian" I've seen, because nothing crucially depends on the application of Bayes' Theorem and/or on the use of Bayesian methods (informative priors, model selection, etc.). Why not simply call it "data driven spam classification" (as opposed to "rule based") or "empirical spam filtering"?
If the spam disaster had struck fifteen years ago, we'd all be talking about "neural spam filtering" (using artificial neural networks, ANNs) and basking in the warm fuzzy feeling imparted by the term "neural". But ANNs and Bayesian classifiers have the same interface: both are trained on labeled data and can be used to classify unlabeled data. The implementation details are not of primary importance, and if you think they are, I'd encourage you to look into large margin classifiers instead of Naive Bayes or ANNs.
but doesn't use Intel components. The description of the software components is strikingly similar to that of the Sharp Zaurus 5x00 series. If it's cheaper than the Zaurus, I might consider buying one. Unfortunately it doesn't seem to include some of the hardware that's not standardly available on the Zaurus, most importantly 802.11b. What a shame.
Just because it's free-as-in-speech doesn't mean you can't charge for it. If Joe Average Consumer doesn't trust a product that costs nothing, then either hide that fact from him, or explicitly charge him for it (you're allowed to charge whatever you think the market will bear for the binaries, as long as you make the source available for a reasonable fee). If you're in the business of selling stuff, then for Todd's sakes sell it, don't try to convince people that they should accept it for free on philosophical grounds. Don't even bring up the whole issue. Why should Joe A. Consumer care what OS is on his machine as long as he can surf the web and send email?
Linear PCM vs. logarithmic
on
Is Louder Better?
·
· Score: 3, Interesting
I've always wondered about why (more or less) permanent audio storage formats like CDs or DAT use linear PCM when it's fairly clear that the human auditory system uses a logarithmic transfer function. Wouldn't we be better off using 16 bit logarithmic samples instead of linear samples on CDs and such?
Note also that the article points out the legitimate uses for pushing up the volume without any distortion. For example, many pre 1980s recordings are now getting a second workover: the original release was on vinyl, then there was the simple 1980s digital transfer to CD, and now many classical recordings (e.g. most of Rudy Van Gelder's recordings for Blue Note) are released a third time after 24 bit remastering and mixing. (Plus there are the Japanese 20 bit releases from the 1990s.) This does make sense, since you when transfering your final 24 bit mix to a clunky old 16 bit audio CD, you need to make sure that you keep the volum as high as possible without introducing distortion, coz if you don't, you lose detail in the softer passages due to the fact that you have to drop the least significant byte of each sample. So louder is in fact better, as long as you don't clip the peaks.
to measure entropy or redundancy. Why not do that directly? A program to measure 8bit entropy is not more than a few dozen lines of C, or one could simply "apg-get install ent".
There once was an element named Zinc which had had one too many a drink. It consumed (more appropriate near St Patrick's day) opiate and was forcibly sent to a shrink.
There once was an element named Zinc, a nutrient they put in a drink like Gatorade ("Is It In You?") {child posts please continue} and in galvanized alloys and ink.
Just read the QuakeAID article on Wikipedia (and look at the edit history from January 2005) to get a pretty good idea of what QuackAID and the people behind it are really about.
Folks have posted Mathematica and other code here, which, while very elegant, treats this as a math problem. However, the problem can be viewed as a simple text processing task and solved by standard Unix tools, pretty much on the command line. The first step is to get the file ee710.txt from Project Gutenberg, e.g. here. Then it's a simple matter of using an AWK or Python script to generate ten-digit substrings and pipe them into factor.
In Linux this is known as cryptoloop, and you get a choice of different ciphers.
Instead of/in addition to posting about the error here, why not send off a note to the Times to let them know about the important flaw in their coverage of this story?
I know for a fact that several people sent letters to the Times. It worked. The Times posted a correction today.
Maybe that's why I bank with a credit union. I prefer to use the extra money I save each month off bank fees and increased interest in my savings accounts to brew my own damn coffee.
I did some research on banks and credit unions where I live (New York metro) and concluded that Washington Mutual and Greenpoint Bank offered the best value to a regular guy like myself. Almost every bank, except for the biggest few, beat my employer's credit union hands down. Washington Mutual and Greenpoint have some very nice interest-bearing checking accounts. My local credit union doesn't pay any interest on checking, lousy rates on savings, and though their fees are low, this is offset by the annual membership fee. This trend is only going to get worse, with more and more credit unions being consolidated and merged.
Unless you're lucky enough to have a well managed, profitable credit union, your best bet is to choose a small or medium bank. Forget the big players, some of which are already close to their maximal size (controlling close to 10% of all deposits with FDIC-insured banks), since they tend to charge high fees to ordinary customers.
You would be correct if the purpose of the test was to distinguish excellent writers from merely very good writers in a creative writing course. However, the issue is quite different. The developers of the grading software know quite well that automatic grading cannot detect mixed metaphors, clicheed expressions, etc. That's why automatic grading is typically used in grades 6 through 10 (it also doesn't offer enough guidance for complete beginners) and only in "low stakes" situations.
Let me repeat that for you: automatic essay grading won't (ever, most likely) be used to score creative, literary works. But it can certainly help advanced beginners, who are struggling with subject-verb agreement (questions like whether to write "the purpose of companies are" or "the purpose of comapnies is"), proper spelling, too much repetition, etc. Simple, trivial stuff for professional writers, but important stuff to point out to beginners.
any truly excellent writer is likely going to do badly
Not really. Because what students are writing are five-part high school essays, not highly creative works of fiction. Simple stuff, like thesis, three supporting points, conclusion. You know, for kids. An excellent writer would realize what's expected of her and write accordingly.
Writing is not mathematics.
Well, you can also say "X is not mathematics" for almost all values of X. That doesn't mean that X doesn't have commonly accepted structures and regularities. High school essays form a cohesive text genre in terms of their structure. So do cooking recipes, job ads, you name it. Sure, kids are graded on how well they can conform to the expectations, but that is true whether or not computers do the grading.
Is this a communist system where no dissent is allowed?
Minor slip of the pen: you don't mean communist, you mean totalitarian. At least in theory, the fact that the people own and control the means of production has nothing to do with whether dissent is or isn't allowed. (Though in reality, all attempts at communism developed quickly into totalitarian regimes.)
I agree with the rest of your comments.
The first one or two years in a North American (US/Canadian) graduate program are often not very different from the last two years in an undergraduate program: lots of required courses, not much research. I'm not saying that's bad: it can be good if you don't know what you want to do exactly, if you're entering a new field and need to catch up quickly on the basics, etc. North American graduate programs are therefore often longer than elsewhere. If you know exactly what you want to do and don't want to spend much time taking classes, look into the top schools in the UK: Edinburgh, Cambridge, Sheffield, etc. Most UK graduate programs (they're called post-graduate programs there) are focused much more exclusively on a research topic, right from the beginning (as I said, that can be good or bad). If you're seriously considering going there, look for studentships on jobs.ac.uk or other pertinent message boards. A studentship will typically provide you with three years of funding, which is considered sufficient for finishing a PhD in the UK (don't know if it actually is sufficient), compared with the nominal five years in North America.
If you want a "natural" language for computers then it would have to be necessarily of Chomsky-0 type.
Whoa, why would that have to be the case? People have made all sorts of arguments for some natural languages being context-free, regular, star-free, etc. In either case you can have very expressive subsets that are context-free. If natural languages could only be expressed by Type-0 grammars, then we would have a real problem explaining how humans process language.
it's well known that the grammar for all human languages follows the same basic rules (Chomsky's hypothesis)
It doesn't matter whether it's well known or not -- it's a hypothesis. (And not a particularly concrete one at that. If you don't say explicitly what the common structure of all languages is, it's pretty much devoid of content. ) But suppose you have a specific hypothesis that says something about all human languages. Then it makes sense to test it against as much data as you can get your hands on, and that includes languages that are about to become extinct.
This has nothing to do with the Linux kernel per se.
/usr/local. That's pretty close to what you're suggesting, no?
Still, the file system hierarchy is basically fine the way it is defined in the Filesystem Hierarchy Standard, LSB, etc. FHS has been around for a long time, at least eight years, as far as I can recall.
If I compile software that isn't already part of a distro myself, I tend to configure those packages with --prefix=/usr/local/stow and then use stow to install symlinks under
Yeah, right. Singing synthesis is about as old as speech synthesis itself. Here's an example from 1958 (item 11 on the linked page). By 1961 singing synthesis had become fairly mainstream -- witness the singing HAL in "2001".
So they're claiming they own the copyright on errno.h. This is insane. Even if there are substantial similarities between Linux's various errno.h-s and SCO's version, how many ways are there to implement errno.h? It's a bunch of friggin' macro definitions with more or less standard names and more or less standard values. Someone correct me if I'm wrong, but I thought one could only copyright original works, but what's original about a bunch of #define-s?
Freeciv is multiplayer and works on about any machine that can run X comfortably. If you want more action, may I suggest the classic XPilot?
You need to consider what you primarily want to do with your system. Are you taking a photography class? Then you probably need spot metering, which you don't get in all of the cheapest bodies. Do you want to do stop-action shots? Then you need fast shutter speeds and fast auto focus and/or predictive auto focus. Do you want to do close-up work? Architecture? Portraits? That determines your choice of lenses. Without knowing any further details, I'd say get a Nikon N65 (around $180, lacks spot metering) or N80/F80 body (around $350, has spot metering, better autofocus) and a Nikon 50mm f/1.8 lens (around $90). That's about as cheap as you can get for decent to excellent gear (the 50mm lens is a keeper). Get a good book and lots of film. Buy from a dealer that lets you try the equipment or return it if it doesn't work for you.
You'll get a lot of advice on the Canon vs. Nikon debate. It's like Perl vs. Python, csh vs. sh, C++ vs. Java (meaning, there is a clear answer, except that it's different for everyone). For me it came down to this: which system is it easier to borrow lenses for? Where I live the answer is Nikon, but your milage may vary. A Nikon 80-200mm zoom lens is a heavy and very expensive piece of glass that I might need once in a while for a weekend, but I could never justify the expense of buying one for myself. So I rent. And for me the only option in this department is Nikon. (But then again, you could just as easily rent a Nikon body together with the lens.) It was almost as simple as that.
Is he just trying to come up with an impressive looking formula...?
It's a linear combination of weighted attributes. How unimpressive is that? At least they should show us a list of games together with their attributes and sales rank. Given that information, we could do a least-squares fit (linear or nonlinear) ourselves, and, more importantly, evaluate the goodness of fit.
Experts hope this will solve a predicted IP address shortage as more devices are created to use the Internet.
This falls into the general category "Death of Internet Predicted". The internet is not running out of IPv4 addresses at the rate predicted in the early '90s, for a number of reasons, including NAT (whether you like it or hate it) and the simple fact that not everyone who wants to browse the web needs a publicly routable address.
Much better reasons for adopting IPv6 is that autoconfiguration is to a large degree built into the protocol (including its associated ICMP messages) and doesn't have to be done by a separate mechanism like DHCP. Also, IPv6 has a fixed length, small packet header, which should make it easier to do all sorts of routing tasks.
If you're running a Linux or BSD kernel, check out one of the many 6to4 tunnel brokers to get onto the 6bone or your own friendly neighborhood IPv6 backbone.
I don't mean to troll, but I hope it's not too late to put an end to the unfortunate term "Bayesian spam filtering". This is perhaps the worst abuse of the adjective "Bayesian" I've seen, because nothing crucially depends on the application of Bayes' Theorem and/or on the use of Bayesian methods (informative priors, model selection, etc.). Why not simply call it "data driven spam classification" (as opposed to "rule based") or "empirical spam filtering"?
If the spam disaster had struck fifteen years ago, we'd all be talking about "neural spam filtering" (using artificial neural networks, ANNs) and basking in the warm fuzzy feeling imparted by the term "neural". But ANNs and Bayesian classifiers have the same interface: both are trained on labeled data and can be used to classify unlabeled data. The implementation details are not of primary importance, and if you think they are, I'd encourage you to look into large margin classifiers instead of Naive Bayes or ANNs.
but doesn't use Intel components. The description of the software components is strikingly similar to that of the Sharp Zaurus 5x00 series. If it's cheaper than the Zaurus, I might consider buying one. Unfortunately it doesn't seem to include some of the hardware that's not standardly available on the Zaurus, most importantly 802.11b. What a shame.
Just because it's free-as-in-speech doesn't mean you can't charge for it. If Joe Average Consumer doesn't trust a product that costs nothing, then either hide that fact from him, or explicitly charge him for it (you're allowed to charge whatever you think the market will bear for the binaries, as long as you make the source available for a reasonable fee). If you're in the business of selling stuff, then for Todd's sakes sell it, don't try to convince people that they should accept it for free on philosophical grounds. Don't even bring up the whole issue. Why should Joe A. Consumer care what OS is on his machine as long as he can surf the web and send email?
I've always wondered about why (more or less) permanent audio storage formats like CDs or DAT use linear PCM when it's fairly clear that the human auditory system uses a logarithmic transfer function. Wouldn't we be better off using 16 bit logarithmic samples instead of linear samples on CDs and such?
Note also that the article points out the legitimate uses for pushing up the volume without any distortion. For example, many pre 1980s recordings are now getting a second workover: the original release was on vinyl, then there was the simple 1980s digital transfer to CD, and now many classical recordings (e.g. most of Rudy Van Gelder's recordings for Blue Note) are released a third time after 24 bit remastering and mixing. (Plus there are the Japanese 20 bit releases from the 1990s.) This does make sense, since you when transfering your final 24 bit mix to a clunky old 16 bit audio CD, you need to make sure that you keep the volum as high as possible without introducing distortion, coz if you don't, you lose detail in the softer passages due to the fact that you have to drop the least significant byte of each sample. So louder is in fact better, as long as you don't clip the peaks.
One thing that's always irritated me with linux is how difficult it can be to REMOVE an application.
Use stow.
Ever heard of AnonImous Coward?
to measure entropy or redundancy. Why not do that directly? A program to measure 8bit entropy is not more than a few dozen lines of C, or one could simply "apg-get install ent".
are available from nausicaa.net: http://nausicaa.net/miyazaki/sen/theaters.php Tell them if it's playing in your area.
OK, I'll bite:
There once was an element named Zinc
which had had one too many a drink.
It consumed (more appropriate
near St Patrick's day) opiate
and was forcibly sent to a shrink.
There once was an element named Zinc,
a nutrient they put in a drink
like Gatorade ("Is It In You?")
{child posts please continue}
and in galvanized alloys and ink.