Domain: cpan.org
Stories and comments across the archive that link to cpan.org.
Comments · 1,172
-
Re:Better than PostgreSQL?
Otherwise at least for me it is in the "not interesting" category until it gets a decent working DBI module compatible with the most recent version.
In what way is DBD::Sybase lacking?
Michael -
Re:Front End...?It certainly does! Plus, without ever using Sybase (I'm more of a PostgreSQL fan), I'm fairly sure that Sybase would provide a C/C++ api.
(For those that haven't caught on, Sybase is a competitor to such products as Oracle, DB2, PostgreSQL etc, and is not compareable to silly little toys such as MS Access)
-
Re:Joy of programming...
My language experience going back 24 years:
Basic, Fortran, Assembly (Intel and Sparc), Pascal, C, C++, Java, Lisp, shell(sh,awk,sed), Perl, and most recently Python. (roughly in that order; I saw some COBOL code once when a young programmer, but was immediately repulsed - thank heavens)
I actively use Perl and Python myself for everything now for several reasons:
1. All of the machines at my job (800+) are all preloaded with Perl - so I have to use it for automation (better than shell scripts particularly for mission critical one-off applications that have to be fault tolerant but deployed at the whim of our marketing and operations staff). If I didn't have to maintain Python myself on all of those machines, I would port everything over to Python in a heartbeat. However it took me 2 years to get management to agree to loading Perl in the first place - and there is no reason to incur the costs associated with validating a new scripting language for use in our production environment. So I live with it - and keep the footprint small.
2. For all other tasks - I use python.
Some neat things fall out of python that even as a neophyte I can appreciate:
a) clean syntax (if I only had all the time spent finding dangling semicolons in perl, I could take a sabbatical)
b) full featured web development tool (Zope - provides a framework for developing and hosting full service applications - designed to make building products to run under Zope easy - seperates the presentation from the logic using ZPT cascading style sheets and DTML for presentation, and python for the program logic [unless you are masochistic enough to depend upon DTML alone] - has a built-in database for managing Zope objects - and built-in httpd and ftpd servers - which can be further frontended using Apache as desired - can communicate with other databases [oracle, ODBC, postgreSQL, etc... many database plugins available] - has a large library of predeveloped products [modules for you perlmongers] that you can load and be up and running, or modify to your heart's content - and did I mention that its GPL'd?)
c) platform independent (just as with java and perl, python scripts can run without modification on many operating systems - keeping porting costs down to a bare minimum.
d) built-in documentation functionality - not as full featured as Perl's perldoc - but I might not have found the right product yet to do that (ideas anyone? or, is this a python project waiting for me to jump on - perhaps something that ouputs XML?)
The only drawbacks (and I use this term with trepidation - because they can seem positively refreshing after 10 years with perl) that I can see are:
A. Does not have the sheer amount of user contributed products (modules) when compared to CPAN^. Of course I wouldn't judge the quality of my carreer based on the weight of all of my program printouts either. Quantity does not equate to quality.
B. Slower than Perl and Java. Again, something I can throw hardware at to rectify. Squid goes along way to making web pages generate faster too - so you can ameliorate some of the problems without having to kill yourself.
C. Sometimes it takes longer to find resources online than with other languages because of the difference in popularity. However, the time spent needing to refer to reference material for Perl and Java is many times larger than the time spent doing the same with Python.
D. Because of my long experience with Perl, I find myself immediately jumping to a predetermined algorithm/function that is implemented differently in Python and thus create syntax errors in my code. This last is really a personal problem that time will erase.
My whole programming paradigm has changed. The advent -
Re:Use a pipe and untilities
PDF files[?]
strings filename | grep textI'm guessing you've never tried that search before. PDF stores the meat of a document in compressed data streams. strings would return a bunch of font names, headers and compressed garbage.
There are a few other tools available, at various stages of stability:
-
Isn't this all conjecture?
I mean, that's great that Ryan Morris, owner of XMfan.com and one of those selling PCRs on eBay, says they're discontinued. But, why would XM possibly do that?
There's obviously a demand now. They haven't taken the site down. I'm not sold on the idea that this is the end of the product line. Maybe there's something similar coming out, maybe they just need to manufacture some more.
TimeTrax certainly was not the first (or at the least, not the only) software to offer XM to MP3 ripping with the PCR. -
Re:No thanks and fuck him and fucking language
Seriously, the guy is basically the computer anti-christ from Revelations.
He comes and people proclaim him the savior, only after everyone has been well marinated in the OO KoolAid(tm) do people realize, damn we've been had, this whole this sucks, is not productive and the prophesied code-reuse never happened.
It turns out the drunken monkey fulfilled the code reuse promise.
If I ever met the guy, I would beat him within an inch of his life for all the fucking extra work I've had to do to pull these Java-Only idiots through even the simplest troubleshooting or development tasks. Oooh! Java is so great - good then learn how to open a fucking socket listener and handle a few connections.
Sorry, I'm very bitter from doing all the work while a team of asses sat around complaining that there was work to be done and collecting a salary. -
M$ == B$
M$ is full of crap. They said they would increase the storage limit, and now they are saying it again. Hotmail is STILL at 2mb.
I use the WWW::Hotmail perl module to forward my email to Google Mail anyway. So in a sense, I do have 1gig of space with hotmail.
-Xantus -
Perl fans...
-
I'd killed Java
-
Re:Java? Python? PERL?This made it on the front page? Lame.
Agreed..
In any case I couldn't imagine that it'd take more than half a day or so to do this in Java or Python.
Or five minutes in Perl with MP3::Splitter:perl -MMP3::Splitter -e 'mp3_split($_,{},[ rand(64800), 30 ],
...) for @ARGV' filename.mp3 -
Re:The Java Problem
search.cpan.org, perhaps? Otherwise, you might like perldoc -f function for documentation on a function or perldoc -q searchterms to look up a question in the FAQ.
-
Let the mice decide...
I guess when they thought up that gigantic computer called Earth, they surely put some kind of message in. It probably is a question like: "What is the question to the answer to the life, the universe and everything, the answer being 42 ?"
Does CCAA CCAAAAGTCAGTTCCTCGCTATGTAACA fit the question, or do we just all carry a piece of Perl script with us ? -
Re:Prior art database
I thought that's what CPAN was.
-
Re:Mining CPan
I think the Python work would be interesting. I'm a long-time Perl coder and Python looks interesting. But IMHO, PHP would be a waste of time. Part of the reason CPAN is so huge is that perl5 is coming up to its 10th anniversary. The perl5 language has remained very stable over that time. But PHP5 has just been released and from what I've heard it's another major change to the language. But if it's got namespaces and/or sane package management like everyone's been begging for, then PEAR might start to really pick up. I guess we'll see in another 10 years. Either PEAR will be huge success, or programmers will reminisce "Remember PHP? I think someone's coded up a Parrot compiler for that old language "
:) -
Re:PERL programs are hard to distribute
You know if you took a second to search CPAN, you'd find that your assertion is not at all true
Also its "Perl" as the name of language "perl" as the name of the interpreter. They aren't acronyms, PERL doesn't exist. -
Re:Java Vs. perlperl. I'm sure you could do it, but I certainly wouldn't want to maintain it. Besides that you'd have to start from scratch and create a bunch of library code before you could even think of starting.
You've obviously never seriously used perl since you don't know about CPAN and its nearly 7k modules.
I can understand endorsing your favorite language. But to try to do it by spreading ignorant FUD about other languages is not a wise method.
-
Re:Java
Perl lets you write code the way you want. It lets you write obfuscated code that is impossible to maintain two months down the road. It also lets you write code that is clean, modularized and extremely easy to maintain.
Perl's strengths are not only in its dynamic nature, the CPAN module archive and its user base. It is also a very powerful language that is beautiful to those who have more coding experience than Teach yourself X in 21 days.
-
Best for databasesAs for java, it's one of the primary languages for easy interface with databases
perl is great for databases, too; look into the DBI:: classes some time. The thing that makes Java useful is the commercial toolkits that easily build GUI forms, JavaBeans, etc.; however, the main interface that big companies need to their database is either their website or a simple locked-down interface to a transactional backend. PHP and ASP easily compete with JSP for the first, and a perl program running in an SSH instance might be the best environment for new development of the other. (However, I don't have a production system to back that claim up.)
-
Then you don't know the right programmers
Spend some time somewhere like Perl Monks and you'll find out that there are a lot of competent Perl programmers who aren't sysadmin types. Or go and look at CPAN to see the variety of different kinds of software that are available in the Perl world.
Furthermore I'd like to point out that Graham made a claim about the behaviour of hackers, not open source programmers. Most of the people who contribute to Sourceforge would not, in Graham's opinion, deserve the compliment of being called hackers. Therefore their aggregate choices are irrelevant.
Disclaimer: I'm a fairly well-known Perl programmer. -
Re:PHP
PHP is definitely not the only user of GD. Heck, GD is a C library. There are a lot of C apps out there that use it.
Personally, I've only used GD via perl and the many perl libraries that use it, primarily GD.pm. -
Can't see the tree for all the bark?
Regex is logically equivalent (i.e. there is a polynomial time mapping between the regular expression language and the BNF) to a LR grammar, though it is more compact. I don't think the authors intended it as a substitute until LR got faster. I believe they intended it to be an LR-equivalent language.
I will admit, though, that sometimes I need the full thing, so I used a LR parser to make this.
It is a bit inefficient, actually, but I think it's worth it.
You may also notice if you look at my code that occasionally I use a regex in there every now and then when I know it won't be too expensive.
Regex isn't (always) fast, it isn't pretty, but it works, it's compact, and it has already been heavily tested.
One last thought: the concept of LR grammars where invented as a mathematical way to express regular expressions as a series of states, because states are a lot easier to deal with mathematically than regular expressions.
That's the only real reason. Regular expression came first, though. They're easier for people to come up with - more natural. -
Re:Can't see the forest for all the trees?
Just compare a reasonably complex regular expression to the BNF form of a grammar for parsing the same input to see how much easier GLR is to use
And this is part of why many Perl folks have eben eagerly awaiting for Damian Conway to release Perl6::Rules. ooh! Looks like an version 0.03 is finally on cpan! -
Linux, WINE, and WWW::MechanizeMost of the web pages I develop are database driven. I use the WWW::Mechanize module as part of an automated testing solution.
To manually test websites, I run Linux on my desktop. This allows me to test Windows/IE via WINE, as well as Mozilla and Konqueror (which should render like Safari).
It doesn't catch every issue, but it works well for me.
-
Re:Can't see the forest for all the trees?
Regular expressions are nothing more than a hack to make up for the fact that generalized LR parsers were quite inefficient up until a few years ago.
So you argue that the regular expressions in Emacs searching should be replaced with a spec of an LR parser? :-)Different parsing methods are good for different applications. Check CPAN for Perl LR parsers.
I read
/. for moments like finding an argument that Regular expr should be replaced with full grammars! :-)(And, as the saying goes -- a Fortran programmer can write Fortran in any language. Without coding standards in your group you will have large problems in any language. Perl probably is a bit more demanding. It's a tradeoff against other features.)
-
MS HTML etc
You've made many valid points, but anyone who works with the intricacies of HTML will tell you that Microsoft's markup is woeful.
However, I'm sure it would be simple (in both ASP and PHP) to write your own w3 compliant HTML library and serve up different stylesheets based on the client, so it's not really a huge issue.
To answer your original question, Perl has a popular module called Mason which abstracts HTML, and integrates well with mod_perl. -
My Top Ten Tools
-
My Top Ten Tools
-
Re:Goodbye Perl?
wheres the php equiv of CPAN, Pear? One of the reasons that php is superior to perl is running a site on mswindows. Perl on win32 platforms (sans cygwin) is a joke.
-
Dev::Bollocks
"The models range from highly business-oriented strategy, marketing, and employee-motivation frameworks, to personally oriented frameworks that help structure time, understand personality conflicts, improve leadership skills, and evaluate career transition opportunities."
That sounds like straight from Dev::Bollocks
Tels
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Little biology?New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, the use of a perceptron might be a great idea but to someone not trained in them it is hard to see how a multilayer perceptron would be especially good. Also it is not clear that this is what is used in real world genetic analyses. (For example it would have been interesting if genetic databases and bioinformatics tools like BLAST were mentioned). Also the Chromosome object does not obviously have anything to do with a real chromosome; it is confusing and made me wonder if there was something I was missing, or was it named that way to sound "cool"? Also it was not clear to me if any of the dynamics of genetic transcription and whether gene crossover, mutation, and selection have anything to do with this project.
Also I am curious about the choice of programming language. Being a perl fanatic I wonder why that is not being used, and of course perl is great at text, and pattern matching, and the important parts of many modules are invariably in C or C++ already, etc.. But also perl is a language of choice for bioinformatics, and there are a number of existing modules for example BioPerl which wraps other programs and Boulder which is an interesting format that could be used to pipe spam genes to other people's filters. Now I don't know if existing bioinformatics tools could be applicable but certainly these are things that ought to come to mind.. and what these tools do is not trivial, and if genes are a valid metaphor for spam components then there is a potential for existing code to be used too. That is something that would be cool.
There are also documented, easy to extend perl modules related to using genetic algorithms or for rolling your own analysis modules, I'm thinking of Genetics and AI::Genetic.
Finally I note the use of the term Corpus. This is really interesting, and suggests the author is into computational linguistics which also represents a massive amount of existing, nontrivial pattern resolution code.
So I'd like to know more about the relationship of both computational biology and computational linguistics to spam. For example, one big part is going to be how to identify genes, or whether you need a generator of pattern matchers that will be able to identify the existence of a gene.
Also there is a short bit about stopping spam by making it literally not pay to spam. I'd like to hear more about how that might be linkable to the biological metaphor.
I don't mean to detract from the work represented by this article, not at all. But I would like to know more about how the system analyzes and exploits the realities of biological dynamics to make a superior antispam tool. For example it would appear that some "genes" might be postulated for links to websites or even mail servers (the vectors of the disease). And some linguistics tools might even help link references to product types as genes.
Finally, and this is just brainstorming really not criticism, I was bothered by the development of a 0 to 1 probability of ham or spam. This is to me the biggest problem with automated filters. I know it can be done, since my antispam method consists of hitt
-
Re:You can simulate it in perl
Do you think this would be easily extensible to graphical programs?
Not sure where you're going with that, but if you can get the "language" into a perl data structure you should be good to go.
What about automatically "expanding" perl's special variables?
I think you want use English;. -
You can simulate it in perl
You could do this in perl if you wanted to. For instance, you can code perl in Latin. It's done using the Filter::Util::Call module, which lets you preprocess your perl code. Read Damian Conway's discussion about it. He gives a simple example using Klingon keywords and talks about implementing a Switch function in perl.
-
You can simulate it in perl
You could do this in perl if you wanted to. For instance, you can code perl in Latin. It's done using the Filter::Util::Call module, which lets you preprocess your perl code. Read Damian Conway's discussion about it. He gives a simple example using Klingon keywords and talks about implementing a Switch function in perl.
-
Re:dmozI second that. Use Catalog to pump the dmoz files into MySQL. This should give you a nice big database, well over 1GB.
Note: I think I had to use Catalog 1.01 because 1.02 didn't work.
-
Re:Oh Gawd! - mentifex kook has escaped usenet asy
Association for Computing Machinery on Mentifex artificial intelligence
Ben Goertzel, Ph.D., on Mentifex artificial intelligence
Comprehensive Perl Archive Network: Mentifex AI mind.txt gameplan
eGovOS Open-Source Government Reference Book includes Mentifex AI
Free Software Donation Directory: Mentifex AI Project
Nanomagazine interviews Mentifex on independent AI scholarship
Redpaper archive of Mentifex documents on artificial intelligence
AI has been solved.
Agents Portal selling Mentifex AI4U textbook of artificial intelligence
GameDev.net selling Mentifex AI4U textbook of artificial intelligence
GreatMindsWorking selling Mentifex AI4U: Mind-1.1 Programmer's Manual -
Re:Prior art - here's mine from a year+ ago ..
I've had a IP to location tool on my personal web site for over a year - it uses one of the more simpler ways of determining location (use the Perl Module Geo::IPfree
... looks like the 0.1 release for that was 2002) so does this mean that my use of that module means I'm in violation of the patent ... or is the Perl Module itself in violation? -
Re:C++ 60X Faster Than Java
Memoizing a function in Perl is even easier:
# This is the documentation for Memoize 1.01
use Memoize;
memoize('slow_function');
slow_function(arguments); # Is faster than it was before
This is just one of the advantages of a compile-on-demand language.
Being an old C hack, I can't help believing that C or C++ should be faster than Perl, but every time I get around to writing equivalent programs and testing them, I find that I am wrong. I'm sure that if I did something purely CPU-intensive, then I would find C or C++ to be the winner, but most of my examples have been I/O based, and perl kicks ass for I/O. -
Those crazy Perl users have beaten them to it!
It's more convenient than Web interface and has no arbitrary limits...it's a quantum computing module for Perl! There's also libquantum for C users, and QCF for Matlabbers.
-
Re:It's not the language it's the library.
I don't argue that there is not a large python library. I am arguing that it's not centrally located, searchable, well documented and easy to install and update using a built in mechanism.
CPAN provides all that and more. No other language comes close and I think it's about time they figured out why.
Why does perl have CPAN but python does not have piPAN (a great name for it too!). Have you ever tried searching for a python module to do something specific? Say you wanted to write a jabber proxy in python that examined each message, logged it to a database, and cleaned up dirty words. How long would it take you to hunt down a class to build proxy servers, class to parse jabber messages, class to write to the database, a class to filter out dirty words? How long would it take to search for the same classes in PERL?
And once you found those classes how well are they documented? DO they have sample code? Is each function and method documented? Take a look at the documentation here
Is there anything on the python sites like this? -
Re:What about readability?
I personally would much rather somebody came up with a longhand way to do real regexps.
How about Regexp::English in Perl 5 or Perl6::Rules in Perls 5 and 6?
-
Re:What about readability?
I personally would much rather somebody came up with a longhand way to do real regexps.
How about Regexp::English in Perl 5 or Perl6::Rules in Perls 5 and 6?
-
Don't laugh
You mean a nested webserver, that only works as long as you keep your browserwindow open? Gee' that's technology!
Actually, I have seen this very idea in Perl--a CGI script or a mod_perl module using HTTP::Daemon or raw IO::Socket::INET sockets to start a temporary http daemon listening on a random port for the purpose of serving graphics made on the fly embedded in a generated web page. Very good for statistics and charts so you can serve everything--HTML and graphics--with one instance of script/module without the need to include complex data in URIs of embedded images, which would run some other script to generate graphics, and without the problem of getting the right dimensions of images if they are not constant. This is actually quite a good idea.
-
And I say,
welcome to the party, PHP.
-
Re:One Big LAME
I know what I'm going to do -- write a simple web-based interface to display the current song, and have a button to skip to the next song using the perl module Mac::iTunes::AppleScript.
That way, I can have all the songs served off of the G5 in my bedroom, but control the music from my laptop next to the stereo.
-
Re:You got fooled!
Um, no, actually. It started off as a joke at the time, but since then Parrot has actually turned into a a real project which will run Perl 6 and, eventually, Python and other interpreted languages. (The Perl folks are in much more of a hurry to ditch their spaghetti Perl 5 VM, so that's priority #1.
:-P) But there's some strong rumblings in the Python community about the Python port in progress, there are quite a few references to JVM bytecode translation and a Scheme port, and I've seen unsubstantiated rumors of Ruby and PHP ports. True, the core Python community isn't planning a switch yet, but if someday down the road the standard Parrot distribution comes with a Python frontend, people might start flocking to it for the one-stop convenience.