kirkjobsluder · Slashdot Mirror

Re:A Comparison of FreeBSD and Linux on FreeBSD: The Complete Reference · 2003-05-13 08:18 · Score: 2, Informative

Well...

One comparison can be found in the essay BSD: Linux With a Twist. The FreeBSD Manual also has a section on the differences primarily focused on the development model.

But just as a summary

Support

Linux has more users, more books, more groups, more mailing lists and more newsgroups. Whether this is good or bad depends on your point of view. I find comp.unix.freebsd.misc to have generally very good advice.

What you get

Most Linux distros seem to be headed towards a "complete desktop in a box" approach. In contrast, BSD just gives you a bare-bones distribution with most other applications available as packages or ports. Under BSD the kernel and core programs are treated as a coherent unit.

Flavors

Linux seems to spawn off a new distribution about once a month. There just seems to be three main BSDs that focus on different things. (FreeBSD, OpenBSD, NetBSD.)

Java

The BSDs lag behind Linux in Java support a bit. I don't have any problems with Java 1.3 but I'm told Java 1.4 is not production quality yet.

License

The BSD license permits incorporation into proprietary systems. Depending on your needs and politics this is either a good thing or a bad thing.

Hardware

Linux has support for more peripherals. NetBSD has support for more CPUs.

Learning Curve

Hrm. BSD pretty much forces you to master command-line unix. The text-based install assumes a pretty good understanding of basic concepts, and while the Handbook is excellent, it also assumes a bit of knowledge.

Re:I've used genetic algorithms on Digital Darwin · 2003-05-08 15:27 · Score: 2, Insightful

Micro-evolution doesn't produce new organs, make a reptile into a mammal or a fish into an amphibian. It doesn't mean that one day a reptile happen to be born with feathers so it was a bird. It's a small variation of what's already there. A change in color or size perhaps, or for example in turtles, the shape of a water turtles' feet are good for swimming while a land turtle's feet are good for walking.

I don't know why I bother because this is too easy. One of the facts that best support evolution is that there are no truely new organs. A brain is just overgrown ganglia. A scale is just a slightly modified form of skin. A feather and hair are just slightly modified scales. mammary glands are slightly modified sweat glands. In fact, the development of a truely different, truely new organ with no developmental ties to other previously existing organ systems would be clear evidence of design. So far, all the evidence is that if there is a designer, it works in ways that produce effects identical to evolution.

But what is amazing is that we have examples of such transformations forwards and backwards. We have arthropods that became land animals and later, insects that recolonized the water. We have amphibians that came to shore and reptiles, mammals and birds that went back to sea.

What Macro-evolution doesn't explain is species that can not advance without certain features, for example the giraffe. The giraffe has a very long neck, which is designed not only to reach tall trees, but also so that the giraffe can bend down and drink without his brain being gorged with blood and exploding. The giraffe has a very powerful heart for getting blood up its long neck, but when he bends down to drink blood is going very powerfully downhill with the added force of gravity. There is a spongy tissue around his brain that holds blood until he is done drinking, and, valvues in his blood vessles to keep more blood from coming into his head. If these things were not in place in the first species of giraffe at the dawn of time, the first species of giraffe would have died as a result of too much blood going to his head and there would simply be no more giraffes. Yet according to Macro-evolution it took a long amount of time for these complex things to develope. Were there ever short neck giraffes? Could be, but, would they be designed in such a way? These things are specifically designed for an animal with a very long neck. The powerful heart would kill a short neck giraffe. He'd die of high blood preasure, yet if the first long necked giraffe was born without a strong heart, it would have died from not having enough blood to its brain. So how did this complex animal come to be? There are many different species such as this that defy the theory of Macro-evolution, species with features that have to be in place in their full form for the species to simply exist and advance because without them they will simply die.

Of course, knowing a bit about what you are talking about would help quite a bit. The question to be thrown back is why can't these features develop in tandem through gradual incremental changes? Like most people who don't know a lick about evolution, you assume that one characteristic must have appeared first suddenly, leaving the other characteristics to radically catch up. This view is perhaps the fault of biology educators who over-emphasize the role of mutation in evolution and under-emphasize the role of diversity within populations. For a start while the problem of blood pressure regulation is more accute for giraffes, it is not unique to giraffes. Most of the features cited as essential to girafes are present to some degree in all mammals (the basic creationist problem of no original organs again). And in fact, we have short-necked giraffes (Okapi) in the present day that, amazingly enough show many of the adaptations cited as unique for giraffes.

But there are many other features of giraffes that make them bad candidates for design. For example, giraf

Human adoption? on Ask Security/Cryptography Expert Paul Kocher · 2003-03-13 12:44 · Score: 2, Insightful

It seems that the primary problem with cryptography is sociology, not mathematics. I spent about two weeks signing messages before co-workers complained that it made mail more difficult to read. A talk I gave last year on the importance in securing reseach data was attended by a total of 3 people. What do you see as the biggest barriers to adoption of digital signatures?

A gift with strings attached is not a gift... on A College Without Microsoft? · 2003-03-12 09:03 · Score: 1

I think this "donation" should be rejected as a matter of principle. I would have no objection to a donation specifically for promoting Free and Open Source Software (FOSS) on campus. But this practice can lead to a case of university policy being sold to the highest bidder. Would it also be ethical for a college to accept a donation from the religious right on the condition that it defund LGBT support centers and women's studies courses? Would it be ethical for a college to accept a donation from the "boycott france" wingnuts to defund French courses?

Re:The guy is a nut... on The Myth of Radio Spectrum Interference · 2003-03-12 03:33 · Score: 1

I don't think it is bogus science. But I do think that the article does not describe the issues very well. His main argument is that spectrum scarcity can be solved using radio transmission protocols analogous to the internet where transmitters dynamically negotiate frequency with the receiver. There is the big catch, IF you adopt this particular technology there is no shortage of spectrum. It is rather like saying that there is no shortage of spectum if everyone agreed to use CW and morse code (CW has a very narrow bandwidth). As opposed to FM or AM.

I think that the point he is missing is that applications tend to expand to fill available bandwidth.

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 21:02 · Score: 1

Well, lets go back and review the arguments:

1. The first argument is a reduction in reading performance is justified by a reduction in spam. Yes even a 30% reduction in performance reading messages (a figure I find overly optimistic given my own cocktail napkin calculations) is unacceptable to solutions that offer 0% reduction in performance in reading messages. (What I have now with procmail and spamassin.) From an end-user point of view, the major work should be done before the spam arrives in my mailbox, not when I open my inbox. In addition, there are the considerable transition costs involved in that a p2p system would be radically incompatable with existing systems.

2. The second argument for p2p is that it holds spammers accountable because each additional message fills up a finite ammout of disk space. This can be defeated by custom spamming software that uses only one copy of the spam on disk. In fact, I would argue that spammers can take advantage of a p2p system by flooding the network with thousands of low-bandwidth notifications in the hopes of maximizing the number of downloads that occur before a blacklist in invoked. (in fact, this tactic is already used by image url spam.) If the spammer targets the daylight hours when many people leave their mail clients open to download mail, quite a bit of spam can get through.

We can't control how spam software uses resources like disk space. However, we can control the bandwidth that is used for spam. This is where a throttling server comes in (which I would argue should be deployed whether we are using p2p or SMTP). Designing mail servers to automatically embargo hosts that experience sudden spikes in activity can reduce not only spam, but mail virus attacks at well. So for example, a spammer runs a script that floods a major midwestern university with messages to likely usernames generated from the city phonebook. The majority of the attack can be averted if, responding to certain conditions (large number of bad usernames, large number of connections from a specific host) the mail server automatically embargoes the ip number where the spam came from.

3: A thrid argument for a p2p mail system is that it enables send-side filtering. The scenario runs along the lines that our intrepid sysadmin magically discovers that a user of his system is sending spam, bans the user, and removes the spam before it reaches all of the recipients. A more likely scenario is that overworked sysadmin runs bogofilter or spamassasin on outgoing mail queue every now and then. I don't see a reason why send-side filtering requires a p2p model. Pipe both incoming and outgoing mail through a spam filter. Provide a one-time password system by which legitimate mail that looks like spam can be resubmitted.

4: Client-side authentication. Advocates of p2p mail argue that it automatically authenticates the server of origin. However, this can be done either via plaintext by sending the username and message ID back to the server of origin requesting confirmation, or cryptographicly through digital signatures or challenge-response sequences. This would reduce "spoofing".

It doesn't *necessarily* have to be this way. The local server actually could do the retrieving at times you specify (or as soon as a message arrives!), then you could log in with POP or something similar to pick them up, just as you do now. Certainly, the most flexibility is offered when the client retrieves the mail, but this system could be flexible enough to offer a choice!

Of course, autmatic downloads pretty much eliminate any of the advantages you describe.

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 15:16 · Score: 1

Of course, remember that if you're downloading 100 messages, many will likely be spam. Getting rid of most of those will automatically improve the performance some.

For some jobs I was fielding 100+ legitimate messages as soon as I logged on in the morning, it is not that uncommon.

I kind of doubt that. There are a LOT of advantages to this scheme. I'm sure you can put some band aids on the existing protocols to make them work better, but overall I think this solution has more advantages.

Such as? Sender-side filtering of spam can be done with current protocols, as can authentication, and blacklists, all without the performance hit of a peer to peer solution.

Right. Like I said, though, the mail client should be threaded, so many of these three second waits can happen at the same time. :)

Still, even with dialup checking my email using current push technology requires less than a 1 second wait (in fact sorting my messages takes longer than getting the list of available messages.) When I'm working on console with local mailboxes the time delay is insignificant. Any alternative method must offer a similar level of performance.

If you can think of a specific way to amend the current system so that it has all these advantages, please suggest it! I'm open to considering any idea, but right now I think this is one of the best!

Ok, one of the big problems I see with this method is that if your server gets turned into a spam box you are basically up the creek without a paddle. Full file system and a possible blacklist. One technical way to stop spam at the source is to use a throttling server. Relay requests are limited to X per IP address per second (the actual number depends on the type of service you are offering). If you exceed that number, the server starts denying relaying to that IP address for increasingly harsher lengths of time.

One of the big problems with any mail scheme is identifying spam. About 95% of the spam I get has some form of malformed header information. This can be handled by sender-side filtering where the headers are checked for sanity as the message is submitted. Failing the sanity check blocks the message. Failing multiple sanity checks in a short period of time results in a "time out" for the IP address.

The third problem, authentication, can also be accomplished without switching from a push to a pull system. The simplest form would be to have all servers in route keep a record of the message ID field. The recipient at the end of the chain has the option of querying the server of origin. "Did message ID from username originate from your system?" The server of origin can then disavow either the message, the username or both. A more complicated method could involve digital signatures or challenge-response sequences.

A basic issue with the peer-to-peer scheme is that it takes place on MY time. When I check my messages, I have to sit and twiddle my thumbs while the server authenticates and fetches all of the messages (and some of my mail gets pretty darn big) in addition to a hefty network load spike. In contrast, I want for the authentication to occur on the message's time. If the message is sent at 3 am, I want for it to be authenticated and delivered when I check my mail at 8 am.

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 14:12 · Score: 1

That's why this plan would still require blacklists of IP numbers that aren't behaving. The difference now is that spammers would have to have their own server behind that IP address... getting an open relay to do their bidding for them just isn't going to be an opition.

I don't see blacklists as an effective solution because they occur after the fact. (By which time, the spammer has moved on.)

Yes, but now it's much easier to identify servers run by bad sysadmins and nuke them. By putting some authentication into the From: field, you now know for sure that message from Spammer@yahoo.com really passed through Yahoo's hands... and my guess is that's not going to be often.

Um, PGP and digital signatures have been around for how many years without substantial changes to email protocols? Authentication does not require changing email from a push to a pull technology. (It does require standardizing on an authentication protocol.)

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 12:09 · Score: 1

Indeed that's a minor drawback. But we're talking about stopping spam here (or at least the vast majority of it). It is WORTH a bit of pain to get this problem behind us!

I disagree. Any spam solution that offers any reduction in performance over current technology for legitimate users is not a "solution". In fact most of the arguments for a peer-to-peer pull solution can be rolled into existing "push" server technology. It should not be a big deal to implement sender-side filtering (perhaps with a challenge/response system for suspicious messages), especially given that in excess of 95% of spam involves malformed mailheaders. Throttling mail servers that automatically deny relaying to ip numbers that make a suspiciously large number of requests can serve the same purpose.

Having said that, I doubt this would really be that big a deal. It would be like loading 100 web pages, except that e-mail is far smaller than a web page and hopefully the server would be optimized to respond quickly. Also, the mail client could be threaded, so the bandwidth could usually be maxed out, not just sitting around waiting for servers.

Frequently I find latencies of greater than 3 seconds is not unheard of, even of fast, well-connected networks. Compared to a typical IMAP connection, this is unsatisfactory. The issue is not volume, but negotiation.

There are trade-offs to anything, but overall, I think this proposal solves far more problems than it causes, and that's a pretty good deal if you ask me. :)

Well, as far as a solution to spam, I think it would be extremely easy to circumvent for a bunch of reasons.

1: This is based on the naive assumption that spammers would use sender-side servers that store a copy of every message sent. It would be a trivial task to create a server to send bogus notifications, then reply with to requests for messages with a dynamically generated message. All it takes is an IP number.

2: It suffers from the same flaw that makes spam a problem with SMTP, a dependence on paranoid sysadmins. It is relatively trivial for a good sysadmin to prevent spam relaying, the problem is the large number of bad sysadmins who don't care.

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 06:25 · Score: 3, Insightful

When the checking-mail process begins, the client would go to the receive-side server to get the list of notifications received. It would first apply any local filter rules to strike out unacceptable notifications, then go one-by-one to the servers to confirm that they sent the message the notification claims, that the server is still offering the message, and than ask for the message itself.

The big problem I see with this is that it would work very well over robust, high-speed networks where all servers have 24/7 reliability. How well will it work over less robust or fast networks? The latency involved in querying and fetching 100 messages adds up pretty darn quick.

If the message has been declared spam by the server operator, then the server will intentially pull the message from availablity and essentially vaporize it before it hits a majority of inboxes. Server owners have an incentive to do this... because it'd be extremely easy to add server owners who don't into a local blacklist.

I think a much better option would be to stop it before it becomes submitted. But I see significant power issues involved with giving sysadmins the power to retroactively nuke messages by content. Yeah, it helps to stop spam but it also gives the sysadmin the power to nuke political content as well.

In addition, I can see how such a system can be technically circumvented by spammers. Set up a server to broadcast bogus notifications and just send a single file out. Blacklists are not effective then for the same reason they are not effective now, the costs of setting up on a new IP is trivial.

Yeah, a verbose log file can be made available for the geeks that wanna know what happened under the hood, but the average end user wouldn't see the message pop into their Inbox until the message has been sucessfully cleared and transmitted. Once its in the Inbox, it's a local object that the user can do what they want to.

Ok, the initial description just sounded like some kind of a distributed peer-to-peer imap where instead of storing the messages on the recipient server the messages are fetched as they are read. But I disagree that this process will be transparent to the user because of the added latency as the recipient server authenticates each individual messages. Checking my mail with IMAP, I know what is available within a second after I open a connection (using local mailboxes is even quicker). I don't see how a "pull" system that authenticates, verifies and fetches for each mail message can match that performance.

Re:Move the onus from the recipient to the sender. on IETF to Look at Spam · 2003-03-09 04:21 · Score: 1

I see some flaws with this from the user end. Would mail clients have to negotiate a connection for every mail message? One of the things I like about email is that if a message appears in my mailbox, it is there ready to download (via IMAP). One of the things I dislike about pull technologies such as HTTP is that I never know when I request a page if the page will be available.

In addition from a user end it can make things more confusing because of the need to negotiate different policies for how long messages are retained. What happens when I need to grab that 6 month old bit of administrivia that I didn't bother to read then but became less trivial in the last hour? Having the sender control the duration and content of email can be a problem for things like email invoices.

Re:Title Changes on Cowboy Bebop Movie comes to the States · 2003-03-07 05:11 · Score: 1

Does it matter when she spends an entire page of expository dialog explaining what the stone is?

Re:Up for penalty? on BSA Accuses OpenOffice Mirrors · 2003-02-28 05:15 · Score: 1

Actually, no they have not. Part of the definition of perjury is not only that the statement was false, but that the witness knew the statement was false and intended to mislead the court. (Source The 'Lectric Law Library). While the statements are technically false, there apparently is no intent to mislead anyone, the BSA admitted a mistake, and offered remedial actions to change future behavior. Courts deal with clerical errors all the time.

Re:sustainable and green is a very hard combinatio on UK to "get serious" About Renewable Energy · 2003-02-23 14:20 · Score: 1

I really don't buy the claim that the use of solar cells does more to modify the reflectivity of our planet than any other building construction. In fact, A major problem in building design is getting rid of solar energy (usually by piping it out of the building with air conditioners.)

Re:Goddammit! on Buy a Segway... Please · 2003-02-19 09:19 · Score: 1

If you could all be so kind as to take a step back.. waaayyy back. Think of cars, particularly in cities. The fatalities. The noise. The pollution. The cost. The traffic. The space they take up. Were a self-respecting geek to examine this system from above, encountering it for the first time, I imagine they would recoil in horror. I can't see it as anything but a giant cluster-fuck.

Well ok, sorry to deflate your optimism about the Segway but...

The Segway offers nothing to the problem that has not existed for the last decade. The three wheel scooter operates in the same footprint with similar performance characteristics but has not been widely adopted beyond people with mobility problems. Bicycles have been promoted since the 1970s as a solution but have failed in mass adoption. As a solution the Segway is overpriced, overteched and perhaps more importantly is a technical solution to a cultural intervention.

Re:Old people on Buy a Segway... Please · 2003-02-19 08:55 · Score: 1

Yes the price is a problem. And younger people would be willing to ride a bike. But my grandma could handle one of these things, and it would actually be a big help to her. She is otherwise stranded at home, dependent on taxis, neighbors, or public transportation (which in the wide- flat- towns of central California is problematic at best.)

Well, here is what I don't get about the segway, most people who have mobility problems severe enough to require assistance can't stand for extended periods of time either. In terms of the medical mobility market it seems to be targeting the narrow segment of disabilities where walking is problematic but standing is ok. On top of this, the old three-wheel electric scooter is half the price, fully covered by insurance and comes with plenty of cargo room.

More truth to piss off fanboys on Salon on Gollum's Failed Oscar Nomination · 2003-02-18 09:21 · Score: 1

I don't think that it has more to do with timing than CGI. But lets be honest here:

1: Serkis was good, but I'm not convinced that he was THAT good in a competitive field. There are a heck of a lot of performances that were left out including Robin Williams for Insomnia, and Molina for Frida.

2: The Best Supporting Actor nominations seems to extend the self-fulfilling prophecy that movies released just before the end of the year get nominations. After all, I don't see Robin Williams for either One Hour Photo or Insomnia, both of which were films that should have been recognized. It is difficult to judge performances because I have not yet had an opportunity to see The Hours or Chicago. In fact, the only pre-December release on the best supporting actor roster is Paul Newman for Road to Pedition.

3: Just about every movie has a campaign for best supporting actor. There was even a campaign get a nod for Lillard for his performance in Scooby Doo.

But time for the general Academy grousing here. The French Connection won best Picture the year I was born with a script that left entire reels without English dialoge and won best score for a Mingus student that gave us a minimalist, discordant, syncopated mood. Granted, I've not seen most of the films up for nominations but the only name that even pushes the art of music scores is Philip Glass (for The Hours). Of course, there is the perpetual nod to John Williams.

If Spirited Away takes best animated film out from under the Americans, is there possibly a chance that I'll get to actually see it this decade?

Re:I know it is dangerous to review a trailer but. on League Of Extraordinary Gentlemen Trailer · 2003-02-16 13:59 · Score: 1

Ohh, looks like I pissed off a fanboy!

You imply that Aliens and Predators are not from the same creative universe; that's quite wrong. If the endless supply of creative works won't change your mind (I'm talking Dark Horse: Presents, not Batman vs. Predator), then maybe taking a closer look at the end of Predator 2 will. Yeah, that's right, there's an Alien head hanging on the wall of the trophy room Danny Glover enters.

The issue here is not so much coming from the same creative universe as throwing two popular critters together as a marketing gimick. Perhaps I still have a bad taste in my mouth from the massive marketing crossovers that generated such ugly messes as "The Secret Wars" that threw together 80% of the Marvel creative universe on a distant planet to have them duke it out. At times crossovers and cameos are a good idea, but come on here. I read the Aliens vs. Predator stuff and recognized it was a gimick by an up-and-comming publisher from the beginning.

"rich exploration of diverse characters bound to a common fate" is something that isn't decided by subject matter, but script, director, and actors. A story is only as good as the storyteller.

Which is my point. League of Extraordinary Gentlemen largely failed to do what Alan Moore had done so successfully with V, The Watchmen or even his work on the Swamp Thing. In terms of story, I found it to be the least promising to translate into the silver screen.

Predator is another traditional stuggle; man vs. nature. The Predators represent the top of a universal food chain - they are bigger, stronger, faster, have seriously advanced technology, and are interested in hanging us on their walls. In their society, a Predator must successfully kill Kainde Amedha - literally hard-meat, their word for Aliens (Gieger's, not general) - to be acceptd as an adult. They hunt humans - called soft-meat, Pyode Amedha - as one of their favorite sports, because we make good prey (we're intuitive, have a strong will to survive, and shoot back). Their society is harsh and complex, but emphasises honor. In many ways, they're more civilized than we are.

I fear you've dismissed a very well-told, well-written, and beautifully drawn series just because it has 'vs.' in the title.

Oh, come on here. An entire paragraph of rationalization for putting together two of the best known sci-fi monsters of the last two decades into a comic book! Sounds about like Godzilla vs. Mothra or Frankenstein vs. The Wolfman. (Both of which also had elaborate rationalizations to get the title characters on screen together.)

Well drawn I will grant you, but well written? I might conceed that may have been possible in later issues for someone to spin gold out of the wretched shit-poor gimick of the premise. However the issues I've read were at best mediochre. Even with your description I see at most an "Outer Limits" episode (and that isn't doing justice to the genre of short science fiction film.)

The PC games are quite excellent, as well.

Strike three there. So far only about half of the video game movies made to date have been worth watching (and then, only if it was at a $2 theatre AND I was too broke for something better AND the other three screens were playing teen dramas.) The other half you could not pay me to watch and I regret paying money the first time.

Finally, don't forget that movies aren't strictly about 'art' - they're entertainment, first and foremost. I, for one, would love to see 'the perfect' Aliens vs. Predator movie, I just don't think it could be pulled off and released in the U.S.

You can't have one without the other. Well, you can but then you fall into the class of film of which Ed Wood was a master: unintentionally funny. By all means, now that you have enlightened me about the qualities of Alien vs. Predator, I would love to see it made because we can use a few more unintentionally funny films out there.

Meanwhile, you seem to be suffering from fanboy syndrome:
1: Thinks a movie made from one's favorite work would be "cool".
2: Unwilling to tolerate any criticism of one's favorite works.
3: Willing to reference trivia for the sake of argument.
4: Thinks that the videogame makes a good argument for a feature length movie.

I know it is dangerous to review a trailer but... on League Of Extraordinary Gentlemen Trailer · 2003-02-16 03:13 · Score: 4, Interesting

To be honest, League of Extraordinary Gentlemen is probably one of my less favorite Alan Moore comics, but I've never been a big fan of the genre of dumping a bunch of unrelated characters into a narrative. Perhaps the worst example is Young Indiana Jones in which kid wonder Jones bumps into every historical figure of the 20th century. People who realy think that an Aliens vs. Predator movie would be "cool" should be profoundly pittied. League does not have the rich exploration of diverse characters bound to a common fate that makes The Watchmen work nor does it have the political poetry of V for Vendetta or the raw mystical imagination of Promethia. V is probably the Alan Moore work I would most like to see translated to the silver screen and the least likely to be made.

I will probably go see this for many of the same reasons that I saw Daredevil a movie about which the best I can say is that it didn't suck, and it enabled me to listen in on a funny conversation about Ben Afflec's chin afterwards. Perhaps this time I'll wait for the $2 theatre.

From the trailer, we have an adaptation that isn't an adaptation. Part of the fun of the comic was the inside jokes on these Victorian characters put into a "Justice League" situation. The trailer delivers little more than "Blade" in 19th century England.

Re:Speaking is faster than transcribing? on Why Project Gutenberg Isn't There Yet · 2003-01-30 17:42 · Score: 1

Actually, I know. I was kinda trying to be funny. But, there is a kernel of truth in both what I say as well as you. I know alot of ummmm and ahhhh would not be good.

Actually, I just read a blub that linguistic researchers have disovered that ums and ahhs are important for human comprehension of speech as a rythmic placeholder.

Re:Speaking is faster than transcribing? on Why Project Gutenberg Isn't There Yet · 2003-01-29 03:15 · Score: 1

Of course, here you are comparing skilled transcriptionists to unskilled speakers. Due to a bad case of RSI, I have been using speech recognition for most of the last six months. For composing papers, it is about the same speed as typing. The biggest problem would be if the text is very heavy with technical jargon, but if you add the word once, it is in the dictionary forever. Actually, transcribing using speech recognition is faster than composing using speech recognition, because the accuracy of speech recognition improves if you given much longer phrases.

Putting it to the test. on Using gzip As A Spam Filter · 2003-01-27 12:18 · Score: 1

Ok, I decided to try it out and run my own statistics on it.

The good news is that with bzip2 it peforms about the same as spamassassin. On my K6-200 BSD system it takes about the same time to process an email message spamassassin. Both take too much time for my taste but that is another issue. Performance is proportional to the size of the corpus.

It's the statistics that bothers me. There is no point in comparing the means (in ambiguous terms) without the standard deviation between groups.

So here is my data. I created a spam and ham corpus from half of my emails. Then wrote a quick script to pipe the other half through the program.

________hratio_________sratio ham____.122(sd.09 8)______.249(sd.079) spam___.276(sd.046)______. 198(sd.060) hratio = compression ratio with ham corpus. sratio = compression ratio with spam corpus. n(ham) = 93 n(spam) = 39

Basically the variance kills compressing with a spam corpus as a test because there is too much ovelap between the ranges. More than half of my spam was within one standard deviation of the ham. The separation between distributions compressing with the ham corpus is ok but not that great.

Re:Correction on Using gzip As A Spam Filter · 2003-01-27 07:50 · Score: 1

There is a minor problem with this sentence. And with this whole gzip business. It is misleading. Words, phrases? You cannot force gzip to match words, gzip tries to exploit every likeliness found, even at the character level. E.g., if your "spam dictionary" contains words sex and pants, mail about sextants will have a good compression ratio.

True, but occasional spam-ham matches are a feature of baysian filters as well. The point is not the occasional match, but whether a text is statistically more similar to spam than ham.

But I would argue that working at the character level or extended phrase level may offer some major advantages. A large part of my spam is formatted as whitespace indented html tags. This is a stylistic trait that would appear to be a very strong diagnostic of spam along with href="http:// and img src="http:// Word based filters split both of these up into separate tokens biasing the results somewhat. For some people the ability to filter tokens embedded in base-64 encoded messages can also be useful.

Re:Meet the Bayesian Filtering Algorythm on Using gzip As A Spam Filter · 2003-01-27 07:20 · Score: 1

I'm skeptical about heuristic filters, because of the possibility of the occasional false positive, which could be an embarrasment (or worse).

I think it is better (and recommended by filter programmers) to use the filters as an aid for classification rather than the end of classification. Especially because filters such as spamassassin detect problems with the mail header that are difficult to eyeball at a glance. But honestly, having used spamassassin for the last year I find the concern about false positives to be a bit overblown. Spamassassin just looks for the same features I look for to identify spam. Humans also have false positive rates as well so it is not obvious to me that a filter which examines the entire message would have a higher false positive rate than a human being scanning the from and subject line.

So at some point, doesn't the sender bear some responsibility for composing a message in such a way that it looks and feels like spam? Almost all of the spam messages I get have more than a half-dozen features that are used to classfy them as spam. About half of those features involve malformed header information that does not appear with almost every legitimate mail user agent. The claims that heuristic filters will mean missing the cold-call job offer or the dirty invitation from your sweetie are highly inflated.

Re:Same old problem... on Using gzip As A Spam Filter · 2003-01-27 04:46 · Score: 1

Filtering is not a true spam solution. All it takes is for one false positive on a Really Important Email and be accidentally deleted to totally destroy the value of any filtering system.

Given that, the alternative to having tagged emails automativally deleted is to collect them in a folder and scan the message senders and subject lines. If you're doing that, then the spammer is getting a pitch through to you in the subject line. This therefore does not lessen the incentive for the spammer, but simply causes him to change tactics and put his best pitch in his subject line.

I guess that this is an interesting question. I keep hearing this argument that filtering is a bad thing because of the risk of false positives. But how is the risk of false positives reduced by removing the filter? Spam filtering for me is a valuable cognitive aid. (One modification to spam assassin would be to put the spam score on the subject line.) I can live with skimming subject lines because many spam models are based on the number of hits from users who buy or click on links in spam.

I also think that it argues a straw man. I don't read very many comments from people who believe that filtering is "the solution". However, content-based filtering is one valuable tool for sorting through large numbers of messages. By all means we should persue trasport-based and source-based strategies for fighting spam as well. But these have their own problems.

Finally, if someone wants to cold-call me out of the blue with a Really Important Message, don't they have a responsibility to compose their message without much of the hype, and html text that gets flagged as spam? It would seem that such a cold-call would have no problems getting through as long as they don't make excessive use of all caps, font tags, embedded images, base-64 encoded text, and references to my penis. If it was really important enough to be worth my time, then it probably is not going to have enough spam features to be flagged as spam.

Slashdot Mirror

User: kirkjobsluder

Comments · 443