Domain: uwaterloo.ca
Stories and comments across the archive that link to uwaterloo.ca.
Comments · 648
-
Re:Funny feeling
You're not training any filter. You're participating in a community effort to assess the accuracy of the corpus -- and, as a side-effect, the accuracy of the community judging effort. The corpus is free of charge subject to a usage agreement.
-
Re:spam is not the same as phishing!The definition used for the creation of the corpus was
Unsolicited, unwanted email that was sent indiscriminately, directly or indirectly, by a sender having no current relationship with the re- cipient.
For more details on issues arising in labelling the corpus, see Spam Corpus Creation for TREC or The TREC 2005 Spam Track Overview. And if you have a spam filter, sign up for TREC 2006! -
Re:Polish politeness.
Umm... coding has LOTS to do with mathematics. At least, the algorithmic/efficiency side of coding has to do with mathematics. Hence why computer science sits in the mathematics faculty at University of Waterloo, arguably one of the top computer science schools in the world.
-
Real spam researchWhy does Slashdot not report on real spam research? They report puff pieces like this and the phishing talk from the MIT Spam Conference, but not the results of TREC 2005 Spam Track (Hint: an outsider using compression techniques was very strong; open source filters like crm114, dbacl, bogofilter and spamassasin were close behind; DSPAM was middle of the pack.) No filter came close to demonstrating those widely-claimed 99.9-whatever% accuracy figures. I guess "news for nerds -- stuff that matters" includes testimonials but not results.
The TREC tests involved tests on 350,000 email messages. A 92,000 message public corpus from this effort is available for free download.
John Graham-Cumming (no relation to TREC) has created SpamOrHam -- a community-based effort to adjudicate the judgements in the TREC corpus. This'll let us test in a big way Yerazunis' contention that spam filters are better than humans.
Any filter writer can participtate in TREC 2006 by submitting a letter of intent now and a filter in due course.
There's also an upcoming scientific spam conference this summer - CEAS. -
Re:Can someone explain this to me?Yes, for instance we could say it is malicious if it wouldn't halt
It's a sad commentary on Slashdot's users that this is only modded to (as of this writing) +3.
Come on guys, shouldn't every coder have at least some vague idea of what the Halting Problem is, not to mention its implications for computing in particular and the limits of what is knowable in general.
Hilbert's Program is dead! Since 1931! Quick, somebody inform Slashdot! :) -
Re:Rule 1
well, people *have* tried
http://plg.uwaterloo.ca/~glmclear/research/perlos/ -
Evaluation?
I have an anti-gravity machine: http://plg.uwaterloo.ca/~gvcormac/antigravity/. I wonder why the Brits spent so much money building Harrier jets?
-
10 years ago, it was a new thing ...
Acadia has done this since 1996. However, it wasn't universal in 1996.
-
Re:602 doesn't apply
Even if importation happens, it is still not excused by 602. 602(a)(2) does not save you from 602(b).
No argument there.
No, the copy is made when the work is fixed in a medium.
This doesn't really mean anything. Is a RAM chip a medium? How about a traveling EM wave? What about punchcards?
But regardless of that question, a copy is made when a copy is made. "This tape will self-destruct in 30seconds" does not mean that it is somehow magically not a copy.
If you and your disk are in the US, the copy is made in the US
This just doesn't work. Simple logic tells us that if the copy is being made here, then the posession of the original must be transferred here, leaving the sender without a copy.
This isn't happening.
When I send you a file, I'm sending you a COPY, not the original.
Consider an email. Say I send you a bunch of information that could not be copied legally, but by receiving that email YOU are the one doing the copying? I don't think so.
You could argue that "archival" is being done on the US side, but the copying is definately being done on the other side of the ocean.
US courts may adopt their own legal fictions as they see fit, but law and reality aren't always the same thing. Here's a good example: In 1897 the Indiana House of Representatives unanimously passed a measure redefining the area of a circle and the value of pi. (House Bill 246) -
Bayes filters do not achieve `99.9%'Here are the results of the latest TREC Spam Evaluation. No filter - not even CRM114 or DSPAM - comes close to 99.9% overall accuracy.
That said, filters can remove 98% of spam with about 0.1% false positives, which makes them pretty useful. Most, but not all, of those 1-in-1000 false positives are marginal anyway.
If you're interested in doing your own tests, there's a free toolkit and corpus with 92,000 messages.
-
All decision-making about emotions
Recent research points to the idea that emotions are essential to all decision-making. when part of the preformtal cortex responsible for processing emotions is damaged, patiens can become incapable of making decisions.
"Antonio Damasio draws an intimate connection between emotion and cognition in practical decision making. Damasio presents a "somatic marker" hypothesis which explains how emotions are biologically indispensable to decisions. His research on patients with frontal lobe damage indicates that feelings normally accompany response options and operate as a biasing device to dictate choice. "
http://cogsci.uwaterloo.ca/Articles/Pages/Emot.Dec is.html -
Useful Reading Materials
If she hasn't read them already, your daughter might find
David Kahn's
"The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet",
ISBN: 0684831309,
Bruce Schneier's
"Applied Cryptography: Protocols, Algorithms, and Source Code in C",
ISBN: 0471128457,
and
Alfred Menezes'
"Handbook of Applied Cryptography",
ISBN: 0849385237,
available for free online at http://www.cacr.math.uwaterloo.ca/hac/
to be of interest. -
Need more info
This sounds to me kind of like the situation in a university Unix network. I'm not entirely sure I understand what you necessarily need that wouldn't be available (though I would like to know, to get a better understanding of the question). Certainly, at the university I attended, we didn't have sudo access, but we were able to develop some rather powerful applications.
I can see an adjustment period of a couple of months, where applications you regularly use aren't available, so you ask for them to be installed. After that, assuming they don't see the general need for an application (or they don't want to have to officially support it), you could theoretically install applications under your home directory. (I was thrilled when I became a grad student, and got 100MB of disk quota, so I could compile and run Blackbox as my window manager instead of the crappy twm we were generally stuck with. In fact, I made it globally executable, so my friends could use it as their window manager. In fact, I received a phonecall once from one of the admins, asking me what this spinning "blackbox" process was running on one of undergrad servers, since I was the only grad student or professor (and therefore in the phone directory) who also ran it.)
These days, as part of my regular job, I am one of the unofficial sysadmins of a Beowulf cluster (largely because I'm the one of the only ones who have developed MPI applications that run on it). I get the odd request from other users who want me to hook them up with some library or such. I compile and install it under /usr/local/whatever, and tell them how to set up their LD_LIBRARY_PATH to link against it, and they're good to go.
Again, I have to ask what you need that requires root or sudo access, that can't be solved by the rare admin call or installing under $HOME. (I really don't mean this in an insulting way. I do want to know. The story post is a little brief.) -
the best form of government...
it would be a mistake to think a democratic government is the best in every respect.
is the benevolent dictator or the enlightened despot. Now who wants to be it? Should one of you become it, don't forget the 152 Rules for being an Evil Overlord! -
Mining Software Repositories
Great Work!!!
FYI - there is a whole community of researchers that are interested in studying such large software repositories
http://msr.uwaterloo.ca/
(International Workshop on Mining Software Repositories)
May be you can write something and submit it over there or at least advertise your data set to that community. -
Not at all
Rational thinking is by order of magnitudes slower than emotional thinking. Even more, the whole process of thinking is based on emotions - read the paper on emotional decisions. In a nutshell, we make emotional decisions first and *then* we rationalize them. If you remove the fear from the sets of emotions, you will seriously change the way the given person thinks. Getting rid of a fear is a bad, *bad* thing.
-
Re:How does this help?
Well, industry has been using "mythical" catalysyts for years then. You'll be hard pressed to find industrial applications where electrolosis is performed that are less than 70% efficient. Norsk Hydro, produces some nice ones
Norsk Hydro Electrolysers (NHE) is today a leading producer of alkaline electrolysers. Some of NHE's electrolysers have an efficiency of over 80% (high heating value). (http://www.bellona.no/en/energy/hydrogen/report_6 -2002/22871.html)
There was also an article/news brief I found last time this was being discussed on http://www.science.uwaterloo.ca/ describing using certain mildly radioactive elements to improve Water=>Hydrogen up to the 90%s
If your father noticed a change in the energy bill, you must have been producing a LOT of hydrogen really quickly (and probably put a lot of salt in it). I did the same thing as a kid with small batteries, and they tend to be limited to arround 1 A max output. Given that water is a poor conductor, I have a feeling it wasn't drawing the full 1000mA, esp since it lasted for several hours. -
Re:First thing we must do...
Physicists do it with ropes and pulleys. We're a kinky bunch
:P
Crap, I just noticed that the page I lifted that from is actually hosted at my university! -
Re:Graphical Object Relationship Modeller
Well, someone was working on it in 2001, but thankfully he never finished. As others have pointed out, it's a horrible idea. Of course, Perl's current compiler/interpreter/runtime would need massive work—possibly a full re-implementation—for such a project, but none of that has anything to do with the language.
Let's be clear, here. When someone says, "you can't write foo in language x," they are full of shit, and they can take the matter up with Turing.
If they say, "language x is more suited to problems of the same class as foo than many/most/all other choices," then they might have a point, depending on the specifics.
The first claim is what was made by the original poster, who is... in case you lost me on this point... full of shit. -
kernel?
No, but it's theoretically possible. You'd have to have a perl compiler, of course. Perl has many asperations, but I'm not aware of the interpreter itself becoming an OS.
:)
Dated, but on point....
http://plg.uwaterloo.ca/~glmclear/research/perlos/
Respectfully,
Anomaly -
u++
We had to use this for school assignments way back when. It ain't bad. A lot more feature-ful than basic pthreads.
http://plg.uwaterloo.ca/~usystem/uC++.html -
The Imprint??
"This Independent band you never heard of released a new album which you will never hear on the radio and will not be able to find in any record store within 500 miles of here.
Did you go to UW ??
-
Re:Try DSPAM
Hrm... well, no.
I wrote that paper, and the configuration I posted here is what was used in the best-scoring run.All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.htm
l For your convenience, here's a link to the Spamassassin code that makes the auto-learn decision. Note that sub learn calls _get_autolearn_points() which uses "score set 2" which does not include the Bayes result. Also notice the string of ad hoc tests (which cannot be disabled) based on head_only_points and body_only_points. The main negative effects are: (1) that Spamassassin fails to train on your ordinary good mail, resulting in more higher resulting false positives; (2) although the Bayes filter flags a large number of spams that the ruleset would not otherwise catch, it is not reinforced on these (worse, if the ruleset says these were ham, the Bayes filter is incorrectly trained to believe this is good mail).
-
Re:Try DSPAM
Hrm... well, no.
I wrote that paper, and the configuration I posted here is what was used in the best-scoring run.All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.htm
l For your convenience, here's a link to the Spamassassin code that makes the auto-learn decision. Note that sub learn calls _get_autolearn_points() which uses "score set 2" which does not include the Bayes result. Also notice the string of ad hoc tests (which cannot be disabled) based on head_only_points and body_only_points. The main negative effects are: (1) that Spamassassin fails to train on your ordinary good mail, resulting in more higher resulting false positives; (2) although the Bayes filter flags a large number of spams that the ruleset would not otherwise catch, it is not reinforced on these (worse, if the ruleset says these were ham, the Bayes filter is incorrectly trained to believe this is good mail).
-
Re:Try DSPAM
"In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained."
Hrm... well, no.
First off "number of hits" is not an "extra ad hoc condition". Number of "hits" is exactly "score". There's no difference, just two pieces of terminology for the same thing. "Level" is another thing, but I won't go into that, as it's only there for the benefit of programs like procmail, and is not used internally.
Now, on to score with and without Bayes. I understand your initial concern, but I ask you to re-visit it. There has been substantial research into Bayes auto-learning under various systems, and what is show time and time again is that a set of well-balanced static rules (such as a set of tropisms or, in the case of SA, the static rule base) is far superior to any feedback-loop. This is why Bayes is discounted when computing auto-learning.
What you're doing is looking at outliers and saying, "see, this 'spam' was trained on as 'ham', and that means SA is broken." In fact, such errors will exist in both directions, but as long as the vast majority of spam trains as spam and the vast majority of ham trains as ham, the Bayes tokens will be correctly scored.
All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.html -
Re:Try DSPAMExplain "broken". Works great for me.....
Some explanation appears here.
In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained.
You can adjust the "extremeness" of the score under which Bayes is trained but training will not be on what Spamassassin reports; only on what the static rules report. It is perfectly possible for Spamassassin to report "Spam" yet train as "Ham" or vice versa. This behaviour is unacceptable in a supervised training setup. I've had it correctly classify a message, only to misclassify the next instances of nearly the same message, because of this behaviour. Auto-whitelists have a similar problem.
There is no Spamassassin user parameter to alter this behaviour. I have hacked Spamassassin but it is obviously not reasonable to post a solution that requires a source change. The only way I know to make Spamassassin train properly - on its own judgments - is to force feed it externally.
The reason that Spamassassin's auto-learn is set up this way is to support unsupervised learning - in a server where it is seldom, if ever, corrected. In this setup, the built-in rules work marginally better than simple self-training. But in the supervised setup they are a disaster.
-
Re:Try DSPAM
I use Spamassassin with a special user configuration file and I train it systematically. In this configuration it works pretty well (much, much, better than out-of-the box). But Bogofilter and Popfile work about as well. As does just the Bayesian component of Spamassassin, ignoring all the other cruft. DSPAM, on the other hand, doesn't work at all well for me.
-
Re:Comparison to other toolsTREC's Spam Track will evaluate several spam filters. There's also a toolkit for do-it-yourself comparison.
Although DSPAM is not an official participant at TREC, three configurations will be evaluated for comparison - with tum, toe, and teft training modes. Zdziarski reported some of the preliminary results in his interview, but complete and comparative results won't be available until TREC in November.
-
Re:Quick!
(by the way: Waterloo hasn't got a university, so it's quite a save bet
Except the University of Waterloo. :) ) -
Re:It's neither
We have our own building.
:) My school's CS department is located in the MC (Math and Computer Science) building.
http://www.cs.uwaterloo.ca/ -
Obligatory Simpsons video
The name's Lanley.. Lyle Lanley..
From one of my previous comments:
Firefox Users: If the WMV doesn't work, try going tools, options, downloads, and on the bottom right click plugins, uncheck wmv, and if you don't want pdfs opening in firefox (meaning download first THEN open, I prefer this method, always faster and more stable) then uncheck pdf and anything else you don't want opening in firefox -
Re:Faulty Grasp of Science
Wow. Touched a nerve there didn' I? Emotional tirade aside, you were right about one thing, I was in a hurry when I typed that post. I was eating a late lunch after a long day of troubleshooting.
Regardless, I will now take the proper time to address your concerns.
Perhaps I was a bit hasty in attacking your use of the word "proving", but it smacked of the semantic word games played by Humanities students worldwide. The game is to use emotionally laden words to convince people; rather than actual facts or objective science. By using the word "proving" (regardless of colloquial definitions) you introduce an emotional aspect into peoples' cognitive processes as they absorb the sentence (Hrm... well this theory is supported, but he's right, they can't actually *prove* it can they?).
Whether or not it's what you intended (and I don't believe it is what you intended - given that my comments made you so angry), it introduces an element of emotional/mental equivocation in the mind of the reader. I abhor such trickery (had you done it on purpose) and maybe came off a little strong because of it.
As for your claim that distinguishing between evidence and coherence is redundant, might I direct you to the following book by the good Prof. Thagard on the subject. Or you could go straight to his website and read the articles on the subject he has published in peer-reviewed journals.
Evidence and coherence are not the same thing, and distinguishing between them is not redundant. Evidence is (essentially) empirical observations, whereas coherence is a process akin to (but largely superseding) logical deduction by which we arrive at certain conclusions/end states.
Now for the reasons I did not post any references to support my claims, I was (as I mentioned earlier) in a hurry and secondly, there had been an ample amount of supporting references already posted in this very forum. Here's a sample:
The original article from Science:
http://www.sciencemag.org/sciext/katrina/#new/
An article from a reputable Japanese project building climate simulations:
http://www.prime-intl.co.jp/kyosei-2nd/PDF/24/11_m urakami.pdf/
Information from Wikipedia:
http://en.wikipedia.org/wiki/Image:2000_Year_Tempe rature_Comparison.png/
http://en.wikipedia.org/wiki/Global_warming/
An article from the Geophysical Fluid Dynamics Laboratory (which has references to further supporting articles published in such peer-reviewed journals as Science, Climate Dynamics and the Journal of Climate):
http://www.gfdl.noaa.gov/~tk/glob_warm_hurr.html/
An excellent comment from this discussion itself:
http://science.slashdot.org/comments.pl?sid=162830 &cid=13606953/
So there you have it, references and data supporting the arguments I made in my post. As to addressing the arguments that you made in your original post (in so far as you made any) the above references should suffice. Your numbered list amounts to nothing more than an enumeration of possible alternatives, with no data, evidence or references whatsoever to demonstrate that any of them is more likely than the currently accepted scientific consensus.
Also, you accuse me of not RTFA, well, I did, and by the time I got through with it, I had noticed many of the discrepencies pointed out i -
Re:I'm disrespectful to dirt!
Mr. Sparkle: Can you see I am serious?
Firefox Users: If the WMV doesn't work, try going tools, options, downloads, and on the bottom right click plugins, uncheck wmv, and if you don't want pdfs opening in firefox (meaning download first THEN open, I prefer this method, always faster and more stable) then uncheck pdf and anything else you don't want opening in firefox -
On the spot much?
Just make those figures up on the spot did we?
Electrolosis in production environments tends to the 70-75% efficiency (www.elecdesign.com) and can be improved with various catalysts up to 90% (http://www.science.uwaterloo.ca/)
Large scale coal plants do better much better than 50% efficiency, especially the newer ones. (we have lots of coal in ND, and thus export lots of electrical power. I've toured a few plants, but don't have the literature with me so I can't dispute this other than to say it's better)
fuel cells tend towards 80% efficiency (science.howstuffworks.com) and there is room for improvement yet.
No internal combustion engine using petro is anywhere near 35% efficient. 32% is on the high end. This number floats around 30% (ford.com)
On the other hand, internal combustion engines using hydrogen tend more towards 35-38% efficiency, so about the number you quoted. (ford.com)
Also, you're auto efficiency doesn't factor in the energy used (and efficiencies involved in) extracting and processing the fuel into gas. This information might have already been factored in your electrical plant estimate, which might explain why it's so much lower than it should be. Or that 50% is an average including low efficiency wind and solar that don't use coal.
Since I'm not actually sure what the powerplants should be, only that it's too low, I'll leave it at 50%, assuming it's fixed for extraction, etc. this gives us:
Fuelcell: 50% x 75% x 80% x 95% = 31%
Hydrogen internal: 50% x 75% x 38% = 14.25%
Current gas engines: well below 30% when including extracting fuels... more like 20% (ecen.com)
You still prove your point that internal combustion of hydrogen is undesirable, but internal combustion engines are not an efficient means of transport compared to fuelcell technology.
And I didn't use the 90% electrolysis that WILL be met once greater demand for hydrogen hits. This is already producable in small scales, and demand = competition = a reason to increase efficiency in the main stream.
Not to mention the CO2 scrubbers, fly ash collectors, etc that make a modern coal plant 100% better for the environment than 10,000 american automobiles
--
Google innovative? Phhfft! This is Zombo-com! -
Ontario Highschool Fun
In Ontario (Canadian province in case you are wondering) there is highschool competition organized by the ECOO (Educational Computing Organization of Ontario) http://www.ecoo.org/ecoocs/Contests.html. when I took part in them there was no age restriction, I took part in all of them once I knew of their existance.
Its generally is in teams of 4 and you are provided a set of 4 problems and have 2 hours to solve them. 4 people 4 problems, does not sound bad, until you notice you have 30mins to code and test. There are bonus points for handing in sollutions earlier (the faster you do, the better bonus), also, there is bonus for flawless first attempt.
As per the language, I dont think there is a restriction. First year we wrote in QuickBasic (closest to my c64 basic), the other years we wrote in Turing (another Ontario specific thing, Turing is a language developed for Ontario Highschools by HoltSoft / University of Toronto http://www.holtsoft.com/turing/).
Choose a language you know well and dont have alot of trouble getting to work with. One year a team showed up with their own pc (requirement at higher levels), but could not work with their tools.
Some questions are simple, here are some I remember:
1) draw a star provided you know the number of spikes
2) game of life
3) kernel / process simulation
It is assumed you know what a highschool student should know, hence trigonometry is not explained, but for game of life or the kernel/process simulation the problem was explained in detail.
One question we were given (third stage) was:
1) here is a formula for volume of tetrahedron
2) here are 4 points, calculate the volume
3) here is another point, tell us if this point lies within the tetrahedron
we were at a loss, how were we to know if the point lies or not? My friend who had this problem (I had kernel/process simulation, but was writting way too verbose code that amounted to nothing) knew that answer lies within the question, and tried to get us to help him, but 4 people 4 problems, everyone had own thing to do.
So what we did? cheated... ;D ... we calculated the volume, and then just always said 'yes' that the 5th point was within. We got volume properly on all counts and 3 of the points properly. The Judge told us that our neighbouring team coded 'no' for the answer, hence got only 2 of the points right and was contemplating weather to rerun the test (judges have alternative data sets for second runs) and hope if they could get better results ;D.
For Canadian Highschoolers there is another contest being run, this time by one of the Worlds best computer universities, the University of Waterloo (watcom = Waterloo Compilers, Sybase and RIM are also Waterloo graduate startups). Its called Canadian Computing Competition http://cemc.uwaterloo.ca/ccc/index.shtml. Unfortunatelly I never took part in this as no one at my school knew of it, and when I became informed it was too late.
Finally, for university studs there is the ACM competition, the mother of all computer competitions. Checkout the problems archive, if you solve one question a day you will have years of fun http://www.inf.bme.hu/contests/tasks/
Both in my highschool and my university people who were interested in competition banded together and ran clubs that were mentored by knowledgable people who were out to help us.
In highschool by last grade we coded basic stuff in ASM, C, C++, Watcom Basic, QBasic, Watcom Pascal, Borland/Turbo Pascal, Turing, OOTuring. I with my friend for class project did simple statistics based AI in bp. Heck, we went through all sorting methods. I had nothing to do at University for first 2 years, computer programming wise.
ahh, those were the times... -
Re:Standard testing for spam filters
I think he means something like The TREC Spam Filter Evaluation Toolkit.
-
Re:Gordon Cormack's Response
Here's the learning curve and the ROC curve.
-
Re:Gordon Cormack's Response
Here's the learning curve and the ROC curve.
-
Gordon Cormack's ResponseZdziarski says,
Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).
First, I would like to thank Jonathan for his recent helpful correspondence in configuring DSPAM for the TREC Spam Filter Evaluation Tool Kit. When finalized, this configuration will replace the one currently available (along with seven others). However, I take exception to the statement above, implying that there is something wrong with the tests that Lynam and I previously published. I stand by those results. Since that report was made public, I have become aware of two others that achieve much the same results: Holden's Spam Filtering II and Sergeant's CRM114 and DSPAM (Virus Bulletin, no longer freely available).Lynam and I said that DSPAM 2.8.3, in its default configuration, achieved 98.15% accuracy on the same corpus to which Zdziarski refers above. The report also argued that accuracy was a very poor measure of filter performance and that a false positive rate such as the 1.28% demonstrated by DSPAM would likely be unacceptable to an email user.
In recent correspondence, Zdziarski suggested three configurations of DSPAM (available here on the web) that achieved the following results:
dspam(tum) fpr 1.81% fnr 0.80% accuracy 99.20%
dspam(toe) fpr 1.94% fnr 0.59% accuracy 99.16%
dspam(teft) frp 1.85% fnr 0.53% accuracy 99.32%More detailed results and comparisons will be made available when our current study is complete. Don't take my word (or Jonathan's) for anything; run this filter and others on your own email. But please take great care in constructing your gold standard.
Gordon Cormack
-
Gordon Cormack's ResponseZdziarski says,
Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).
First, I would like to thank Jonathan for his recent helpful correspondence in configuring DSPAM for the TREC Spam Filter Evaluation Tool Kit. When finalized, this configuration will replace the one currently available (along with seven others). However, I take exception to the statement above, implying that there is something wrong with the tests that Lynam and I previously published. I stand by those results. Since that report was made public, I have become aware of two others that achieve much the same results: Holden's Spam Filtering II and Sergeant's CRM114 and DSPAM (Virus Bulletin, no longer freely available).Lynam and I said that DSPAM 2.8.3, in its default configuration, achieved 98.15% accuracy on the same corpus to which Zdziarski refers above. The report also argued that accuracy was a very poor measure of filter performance and that a false positive rate such as the 1.28% demonstrated by DSPAM would likely be unacceptable to an email user.
In recent correspondence, Zdziarski suggested three configurations of DSPAM (available here on the web) that achieved the following results:
dspam(tum) fpr 1.81% fnr 0.80% accuracy 99.20%
dspam(toe) fpr 1.94% fnr 0.59% accuracy 99.16%
dspam(teft) frp 1.85% fnr 0.53% accuracy 99.32%More detailed results and comparisons will be made available when our current study is complete. Don't take my word (or Jonathan's) for anything; run this filter and others on your own email. But please take great care in constructing your gold standard.
Gordon Cormack
-
Gordon Cormack's ResponseZdziarski says,
Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).
First, I would like to thank Jonathan for his recent helpful correspondence in configuring DSPAM for the TREC Spam Filter Evaluation Tool Kit. When finalized, this configuration will replace the one currently available (along with seven others). However, I take exception to the statement above, implying that there is something wrong with the tests that Lynam and I previously published. I stand by those results. Since that report was made public, I have become aware of two others that achieve much the same results: Holden's Spam Filtering II and Sergeant's CRM114 and DSPAM (Virus Bulletin, no longer freely available).Lynam and I said that DSPAM 2.8.3, in its default configuration, achieved 98.15% accuracy on the same corpus to which Zdziarski refers above. The report also argued that accuracy was a very poor measure of filter performance and that a false positive rate such as the 1.28% demonstrated by DSPAM would likely be unacceptable to an email user.
In recent correspondence, Zdziarski suggested three configurations of DSPAM (available here on the web) that achieved the following results:
dspam(tum) fpr 1.81% fnr 0.80% accuracy 99.20%
dspam(toe) fpr 1.94% fnr 0.59% accuracy 99.16%
dspam(teft) frp 1.85% fnr 0.53% accuracy 99.32%More detailed results and comparisons will be made available when our current study is complete. Don't take my word (or Jonathan's) for anything; run this filter and others on your own email. But please take great care in constructing your gold standard.
Gordon Cormack
-
Gordon Cormack's ResponseZdziarski says,
Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).
First, I would like to thank Jonathan for his recent helpful correspondence in configuring DSPAM for the TREC Spam Filter Evaluation Tool Kit. When finalized, this configuration will replace the one currently available (along with seven others). However, I take exception to the statement above, implying that there is something wrong with the tests that Lynam and I previously published. I stand by those results. Since that report was made public, I have become aware of two others that achieve much the same results: Holden's Spam Filtering II and Sergeant's CRM114 and DSPAM (Virus Bulletin, no longer freely available).Lynam and I said that DSPAM 2.8.3, in its default configuration, achieved 98.15% accuracy on the same corpus to which Zdziarski refers above. The report also argued that accuracy was a very poor measure of filter performance and that a false positive rate such as the 1.28% demonstrated by DSPAM would likely be unacceptable to an email user.
In recent correspondence, Zdziarski suggested three configurations of DSPAM (available here on the web) that achieved the following results:
dspam(tum) fpr 1.81% fnr 0.80% accuracy 99.20%
dspam(toe) fpr 1.94% fnr 0.59% accuracy 99.16%
dspam(teft) frp 1.85% fnr 0.53% accuracy 99.32%More detailed results and comparisons will be made available when our current study is complete. Don't take my word (or Jonathan's) for anything; run this filter and others on your own email. But please take great care in constructing your gold standard.
Gordon Cormack
-
Re:Bogofilter And Standardized Bayesian Testing
The answer to the second question is spam track at TREC (http://plg.uwaterloo.ca/~gvcormac/spam/)
-
Re:World Domination
You might want to read this paper that Rob Pike wrote about five years ago about the state of systems research. What has happened in the computing world over the past decade is a lack of innovation, especially in computer architecture and in systems programming. In the architecture world (which Pike only tangentially covers), most of the RISC architectures and the Alpha are fading away in favor of the Intel x86 chips. In the operating systems world, the only decent and usable choices that we have are Unix clones/derivatives or Windows; Plan 9 failed to make a splash in the market, and most new OSes are heavily influenced by Unix's design. Ultra-fast and very elegant workstations are being displaced in favor of boring PCs. Even Apple is going to turn into yet another PC vendor in two years.
My goal is to be a computer scientist and to do research that would hopefully have an impact on computer science. However, I'm starting to worry about the state of the field in general. I hear more about outsourcing, architectures and OSes dying, and new DRM technologies more than I hear about the latest advancements in computing. On the flip side, look at game consoles and cell phones. The next generation of game consoles get all of the cool processors, and cell phones just get more innovative each week. If the computing field were truly innovative, in five years we'll see 128-bit processors with an elegant RISC architecture, running an exokernel operating system, featuring an ultra easy-to-use desktop, and is also very easy to develop with using a very high level programming language and a nice set of libraries to program. (Ok, I might have been dreaming in that last sentence). Knowing how the field is performing, in five years we'll still be using x86 PCs with either Linux/BSD, Mac OS X (only with Apple's proprietary x86 PCs), or Longhorn. But wait a minute. We will get the latest and greatest in DRM technology, since the only processor vendors left will be Intel and AMD, and they're both in the Trusted Computing Group. Oh boy.
I completely agree with you about the lack of innovation. Something needs to change before we start going into the computing dark ages.
-
Re:That's Queen's University, not Queens' Universi
Top engineering school...I don't think so. Not compared to the University of Waterloo. Their solar car, the Midnight Sun, http://midsun.uwaterloo.ca/www/, has won several races and currently holds the world record for the longest purely Solar-powered tour around North America.
-
What's the purpose of any competition?
To win, to have fun, to learn something, to promote oneself, to promote awareness.
What's the point of your favourite form of entertainment?
By the way, here's Waterloo's entry. -
Solar Lifetime
I'll be rooting for my home team.
How much energy does it take to make a solar panel? Once in a while I hear someone say that solar panels take more energy to manufacture than they will produce in their entire lifetime, but I don't buy that without any numbers... -
University of Waterloo
The University of Waterloo in Canada generally makes a selling point of the fact that staff, faculty, students, RAs, etc retain ownership of any copyrights or patents that stem from their research.
-
University of Waterloo
The University of Waterloo in Canada generally makes a selling point of the fact that staff, faculty, students, RAs, etc retain ownership of any copyrights or patents that stem from their research.
-
Re:Beer
He's got one better on you!
Check out one directory down:
http://www.eng.uwaterloo.ca/~gmilburn/
Maybe he bent the coil in between chugging beers?