alansz · Slashdot Mirror

S-plus / R? on Use of Math Languages and Packages in Research? · 2003-02-26 08:01 · Score: 1

I have gone several ways on this question. Mathematica for symbolic solving, coding my own in C for mathematical modeling. But I did once use S-plus which is a fairly nifty matrix algebra system that might be useful to you. An open source variation is R

Bayesian filter anyone? on Google vs. Boilerplate Activism · 2003-01-27 11:01 · Score: 3, Interesting

Give a particular topic of letter, this problem isn't too different than looking for spam vs. ham, and can be approached in similar ways (e.g. Bayesian filter).

Actually, you probably could do quite well identifying boilerplate by simply dropping all punctuations, spaces, and capitalized words, and then computing a hash (say, md5) over every even letter and over every odd letter. If either hash matches either hash of another letter, that should
be a very specific indication of boilerplating.

These still require a corpus of letters, though, or a way to generate one from a search.

Better options than dd on Data Mining Used Hard Drives · 2003-01-15 16:34 · Score: 2, Informative

Actually, using dd from /dev/zero is not a highly secure way to wipe a drive (though it's a lot better than nothing!)

For stuff like medical data, financial data, etc., I'd seriously consider looking into wipe instead, which uses Peter Gutman's patterns.

Re:How I block Korean spam on The Measured Effectiveness of Blocking Asian Spam · 2002-11-13 16:40 · Score: 1

I'll second the nomination of SpamAssassin. In the last 30 days, it tagged 427 messages to me as spam. No false positives, and probably about 30 or so false negatives (I use the standard threshold). I could probably tweak it to do even better.

Not just FP,FN, but Base Rates! on Face-Scanning Loses by a Nose in Palm Beach · 2002-05-27 00:17 · Score: 2, Informative

This problem is exactly analogous to the proposal to test all married couples for HIV that went around Chicago some years back. Surprise, surprise, the base rate of HIV among to-be-married couples was quite low. More false positives than true positives. Lots of wasted time, money, and stress on re-screening.

As you may know, Bayes Theorem (actually a statement of fact in probability theory) says:

Post-test odds = Likelihood Ratio * Pre-test odds

(Where the likelihood ratio for a positive test is the sensitivity/(1-specificity), or TP rate / FP rate)

If your pre-test odds of being a terrorist are very low (and when you consider how many terrorists fly compared to how many non-terrorists fly, they must be exceedingly low), you're going to need a very, very powerful ("highly specific" in medical terms) test if you want to reliably determine that a given person ought to be treated with greater care.

On the other hand, if they were planning to spend a lot of time and money screening people anyway, and they could improve their sensitivity (TP rate), facial recognition might be a (statistically) sound approach to screening *out* suspects. That is, one you pass a face-detection screen that has a high TP rate, you don't need to be subjected to as much extra screening; but if you fail the face-detection screen, it's not really diagnostic.

Normally, you could use my diagnostic test calculator to fool around with numbers yourself and see what the impact would be, but it appears to be down until I can get to the server (dratted dist upgrade!)

Re:What's up, doc? on UK Home Office plan: ID Chips in Everything · 2002-05-12 06:03 · Score: 2, Insightful

Well, let's say I live somewhere where the local folk decide it's a good idea to have a book-burning - Harry Potter, maybe, or Catcher in the Rye. Or the local government decides certain books and those who read them are subversive and should be watched. Or the local corporations decide that if they could compile a big database of who buys certain types of books, they could "target" their marketing of associated products, and sell lists of, e.g. Kilgore Trout fans, to the highest bidder.

Be awfully convenient for them to be able to find who's got those books, and where, don't you think?

(It's only paranoia until they get you. :)

Re:Even doctors are abanodning the Hippocratic Oat on First, Do No Harm - A Hippocratic Oath for Coders? · 2002-05-06 14:18 · Score: 3, Informative

Ok, it's slightly off-topic, but just to clear the record.

I work at the College of Medicine of the University of Illinois at Chicago, which is the largest one in terms of MDs graduated annually in the US (about 400 per year).

Like many other US Medical Colleges, the oath that graduates take is the 1948 Declaration of Geneva version of the Oath of Hippocrates, which reads:

Now being admitted to the profession of medicine, I solemnly pledge to consecrate my life to the service of humanity. I will give respect and gratitude to my deserving teachers. I will practice medicine with conscience and dignity. The health and life of my patient will be my first consideration. I will hold in confidence all that my patient confides in me. I will maintain the honor and the noble traditions of my medical profession, My colleagues will be as my family. I will not permit consideration of race, religion, nationality, party politics, or social standing to intervene between my duty and my patient. I will maintain the utmost respect for human life. Even under threat I will not use my knowledge contrary to the laws of humanity. These promises I make freely and upon my honor.

As you can see, even medicine changes with the times, while trying to maintain the important features of the Oath of Hippocrates.

Re:Give 'em good tools and they'll build it themse on Community Networks and Websites? · 2002-05-06 07:59 · Score: 1

One of my favorite tools are the various flavors of MUSH servers, such as the one I maintain, PennMUSH. In many ways, muds can provide everything you've asked for -- categorized fora (real-time chat channels, virtual spaces, and asynchronous bulletin boards) that are user-extensible with a relatively simple initial set of commands, a clean interface (text with ansi color), virtually no lag, and boss-friendly in appearance.

I have been involved with communities that started out of MUSHes and later evolved into off-line communities, and vice versa.

Life mirrors art, or... on Fire Extinguisher Balls · 2002-05-03 17:19 · Score: 1

Do not taunt Happy Fun Ball, fire extinguisher model

Re:Any open relay honey traps? on Spam Slows AT&T Email · 2002-02-23 06:49 · Score: 1

Many people who run honeypots base them directly on sendmail, by running "sendmail -bd" on systems that aren't supposed to be mailservers, as described in this page

Protecting my server, thank you very much on Are SPAM Blacklists Unreasonable? · 2002-02-15 11:18 · Score: 5, Informative

DNS-based blacklists are not your problem. There are no more than a dozen that are really widely used (some orbs spinoffs like http://www.ordb.org and http://www.orbz.org, the MAPS ones if you're willing to pay (or can get a hobby contract) at http://www.mail-abuse.org, and the collection at http://relays.osirusoft.com that includes open relays, spamhaus, and SPEWS. All of these systems have clearly-published listing policies and are actively maintained and if you're blocked by one of them, you'll likely get out sooner or later once you're clean. (In some cases, you can have them automatically retest you). Plenty of mail admins find that using the information on these sites to protect their mail servers from spam is highly effective.

Your problem is twofold. First, while you've cleaned up your open relay, plenty of spammers and spam-friendly hosts make the same claim and lie (Rule #1: Spammers lie). So you may have to be patient.

More importantly, your server ip may now be sitting in hundreds of private blacklists of mail servers whose admins don't like to use the centralized lists, and just reject/blackhole spammers on their own. It is the presence of well-trusted centralized blacklist services that gives you even the hope of ever having decent communication, because without them, you'd get into a thousand tiny blacklists and never get out.

(P.S. Note that if you're checking your status using the rblcheck tool at http://relays.osirusoft.com, it will tell you about a lot of blacklists that are not intended to be publicly used and not part of the usual osirusoft dnsbl, as well...)

Journal of Virtual Environments on Quantification of EQ Players · 2002-02-12 02:50 · Score: 2, Informative

An online publication venue for this kind of work (and a place to go to read other related work) is the Journal of Virtual Environments (formerly Journal of Mud Research).

It's good, but... on Bastard Operator from Hell II (Son of the Bastard) · 2002-02-06 04:51 · Score: 3, Informative

First, a disclaimer. I like this book. Despite having printouts of most of the BOFH stuff already, I bought both the first BOFH book from Plan Nine and this one, because it's nice to have a bound copy to put on the desk to scare users.

But there are a few critical points that should be made about the second book, and that can hopefully be avoided in the next installment:

The PFY comes from nowhere. The stories that introduce him aren't included, so if you don't already read BOFH online, he appears rather abruptly.
Illiad's illustrations are cute, but there are only about four of them, repeated over and over, which is a real shame.
The price, as previously noted, seems a bit much for the quality of the paper, etc. used, but obviously, the market will bear (has borne) it.

As compared to the first BOFH book, you get a lot more BOFH vs. corporation (especially accountants) and less BOFH vs. users. Depending on your outlook, this may be a very positive thing or not. :)

Re:As a software developer myself... on Beta-Testers and Intellectual Property? · 2002-02-05 05:50 · Score: 2, Interesting

Last I checked, if someone patches my (source freely available) code, they've created a derived work, and I retain the copyright. Assuming that their patch can't stand alone as a separate work, it's legally mine.

Can't you hear it now? on Super Bowl Commercial Skewer-a-thon · 2002-01-30 18:40 · Score: 3, Insightful

"Super Commercials: A Mental Engineering Special" is made possible by a grant from Doubleclick.

Difficulty of producing data on Scientists No Longer Sharing Information? · 2002-01-27 15:58 · Score: 2, Interesting

Now, I'm not a geneticist, I'm a research psychologist in the area of medical judgment and decision making, where professional norms are to keep your data for years and provide it on request, but I fully understand the problem of "difficulty/convenience" -- even in finding your old data for yourself.

This is a place where research scientists could really use some good old fashioned technological and social help from programmers. Consider a typical computer-administered psychological experiment's process:

Write code to run the experiment and log the data. If you didn't document the code or write self-documenting code, you'll have trouble when someone wants the data later.
Run the experiment and collect those log files. If the log file format isn't self-documenting, you'll have trouble when someone wants the data later.
Get all the log files transformed into a format that can be usefully imported into statistical software. If you didn't document all the variables and values in the resulting stat file, you'll have trouble when someone wants the data later. And most statistical packages allow you 8 characters for variable names and make detail labelling of variables and values highly tedious.
Analyze the data and produce some output. If you didn't save the analysis details (as is all too easy to forget if you're doing stats with a dialog-box-based program), you'll ... (well you know the rest).
Write a paper describing what you did and submit it to a journal. Have it accepted (hopefully) in 4-6 months. Have it appear about 6 months later. It is now probably 18-24 months since you started the study. If you're lucky, you've probably changed computers at least once by now, and possibly offices/buildings/universities, too. If you can find the data yourself and understand what it means, you're ahead of the game.

This is not too far from the problem of managing a source code project over time and across maintainers. It's not enough for professional scientists to have standards for retention and sharing of data -- we need a tutorial in documentation (and statistical and other software packages that better support it.)

Re:Oops on Pay to Play II - Project Entropia · 2002-01-26 12:39 · Score: 3, Informative

And by so doing, Medievia has been accused of violating the license of the Dikumud source code on which it is (by admission of its creator as well as by inspection of source code) drived, which prohibits any commercial use.

Of course, this new Entropia project gets to write their own license, assuming they're not basing their code on one of the many fine free mud codebases (where your equipment might degrade through use, but not due to economic externalities!)

The Professors Are In on Who Works During the Holidays? · 2001-12-25 12:14 · Score: 2, Interesting

As one set of research grant deadlines for major U.S. Federal agencies fall early in the year (NSF: Jan 15, NIH: Feb 1), most Decembers find me plugging away.

For those of us in academia, especially on the tenure-track, "holidays" often mean "when you're not teaching and can get around to writing up your papers or grant proposals", although I'm pleased to say that I'm also getting to travel to see my family (hooray for the laptop and the spread of home broadband).

- Alan, Asst Professor of Clinical Decision Making

Re:Out of date? on Managing Mailing Lists · 2001-10-02 09:07 · Score: 3, Interesting

Thanks for the kind review and the useful comments. I'm pleased that the book still has some value today, though I agree that the specific MLMs covered are no longer current -- I've just sent a copy of this page to O'Reilly along with a suggestion that we do a second edition, so we'll see what they say!

My playlist for MML2 would include mailman, majordomo2, listar, and ezmlm; greater discussion of non-sendmail MTAs (qmail, postfix, maybe exim), coverage of commercial options (running your own with lyris/listproc vs outsourcing vs. egroups.com approaches), and greater depth in terms of list policy, spam issues, legalities, and tips for specific kinds of lists (like opt-in marketing to established customers)

It was a thrill to see this on slashdot. :)

- Alan Schwartz (author of Managing Mailing Lists)

Right in principle, wrong in practice on Bills to Restrict Campus Internet Access · 2000-01-24 04:48 · Score: 1

If I'm a news administrator, I have the right to decide what to carry on my news server, right? That's why spammers complaining about UDPs don't get very far with me.

Well, the fact is, if I own a network, I probably should have the right to decide what to carry over it. In this case, Arizona owns that network, its people presumably expect to provide that network for educational purposes, and its elected representatives get to decide what can be carried on it. (Unless it's a common carrier, or, as a governmentally-owned system, the 1st amendment applies, of course).

Students may be forced to find alternative internet providers (dialups) rather than use the campus network, just as you might have to find an alternative USENET source if you didn't want to participate in a UDP.

So, the bill can be right in principle (absent the 1st amendment issue). But totally wrong in practice if the goal is to save the taxpayer's money, of course -- it will certainly cost more to enforce than it will save in reduced "porn bandwidth".

(Now, you and I know that the goal is really to return to some kind of imaginary "when I was girl, people were proper" morality, but the argument is made on cost as well, and, if the democratic process works, will be answered that way in Arizona and this bill will go down in flames.)

Slashdot Mirror

User: alansz

Comments · 45