Henry+Stern · Slashdot Mirror

Re:Really, it's research.overture.com on Yahoo! Research Labs · 2004-01-20 06:24 · Score: 1

If you take a look at their publications list, you'll find that the lab has been around for about a year. Tech report OR-2003-002 would have been submitted for publication to SIGIR in January 2003.

Rosie Jones, Dan Fain. Query Word Deletion Prediction, Overture Research Technical Report OR-2003-002

Some of their papers look quite interesting. Check them out at http://labs.yahoo.com/publications.xml.

Re:I read this and wonder about UNIX on Crack the Code and Win a Million Bucks · 2004-01-20 03:57 · Score: 1

If a black hatter can read your shadow file, you have bigger problems than protecting your 64-bit hashed password from them.

Come on now... on Local News Anchor Feels Pain from Afar · 2004-01-17 09:35 · Score: 4, Insightful

Aren't there more constructive ways of spending energy than complaining about a guy who is lucky enough to be able to work from his vacation home?

Yeeesh.

P.S. What does Clear Channel have to do with this, anyway?

Fatal flaw... on Intertrust Plans Universal DRM System · 2003-12-16 11:28 · Score: 1

What's the use of protecting it if I don't want it? I haven't bought anything involving DRM or copy protection (DVDs, copy protected CDs, e-books, software) for over a year and don't intend to start again in the future. Yay for freedom of choice!

Re:No doubt the OEMs have not been told on Embedded Device Manufacturers Ignoring GPL · 2003-11-30 06:16 · Score: 1

While Free Software developers do "give away" copies of their software, there are still development costs associated with it. At the very least, one could show that by using the unlicensed code the offender effectively stole X hours of development time which would have cost Y dollars per hour for them to develop on their own at the industry average rate. You might also be able to show damages for the time that their developers spent on modifying the software since those lost contributions to the codebase would have to be written by someone else using the before-mentioned cost formula.

Lastly, don't forget punitive damages.

Re:No doubt the OEMs have not been told on Embedded Device Manufacturers Ignoring GPL · 2003-11-29 16:32 · Score: 1

MS sues for real money.

The last sentence of your post brought a question to my mind: Why don't copyleft owners ever sue for money? It's not like using unlicensed Linux kernel code is any different than using an unlicensed copy of Windows CE. In both cases you are violating the terms of the license and in both cases you are potentially causing damages against the copyright holder.

Re:Here are some that come to mind... on Great Computer Science Papers? · 2003-11-16 02:40 · Score: 1

Seems as though a few were modded up to my threshold while I was typing my previous post up. Ignore my first paragraph!

Here are some that come to mind... on Great Computer Science Papers? · 2003-11-16 02:37 · Score: 4, Informative

Since nobody who seems to have actually read any computer science papers has posted, here are two that immediately come to my mind.

Vannevar Bush. As We May Think. Atlantic Monthly, July, 1945.

This paper put forth the very first ideas about how people can mechanically search for information. While we don't have desks with levers on them, we do have Google. :)

Tim Berners Lee. Information Management: A Proposal. 1989.

This paper is where Tim Berners Lee proposes what we now know as the world wide web. It's an interesting read if you'd like to see what the original intent of the web was so that you can compare it to what we have today.

A place to look for good old computer science papers is in older issues of Communications of the ACM. There are lots of articles in plain English that you may find of interest. If you are a university student, your school may have a subscription to the ACM Digital Library. If they do, you can read all the issues back to 1958.

Also, you can find a lot of interesting CS publications at Citeseer. They have a page with the top 200 most accessed papers of all times. When I skimmed through it, I saw quite a few titles that may be of interest.

Yeah, real smart... on Billy the Kid Faces The Law... Again · 2003-11-12 15:39 · Score: 1

And just what happens when they discover that none of those people are Billy the Kid?

Bye bye tourist dollars.

Renaming a directory on Home Directory In CVS · 2003-11-11 11:55 · Score: 4, Insightful

I shudder to think what happens when he tries to rename his files and directories.

Definitely the biggest problem with CVS. :(

Re:"If voting could really change things" on Diebold Chases Links To Leaked Memos · 2003-10-29 01:33 · Score: 1

The is attributed to Revolution Books, a chain of non-profit communist bookstores around the United States.

Here is an article posted at the Columbia University school of journalism about the store. http://www.jrn.columbia.edu/studentwork/cns/2003-0 3-30/121.asp

Here is an interview with the manager of the store, Joan Hirsch. http://www.furious.com/rev.html

I'm glad to see that there are still some people left who have the backbone to stand up for what they believe in.

Re:Huh? on VeriSign Responds To ICANN's SiteFinder Advisory · 2003-09-23 00:30 · Score: 1

Interesting. From the "Registry Code of Conduct":

1. [VeriSign Global Registry Services] will not show any preference or provide any special consideration to any ICANN-accredited registrar with regard to Registry Services provided for the .com TLD.

Verisign is showing preference and providing special consideration for an ICANN-accredited registrar with regard to Registry Services provided for the .com TLD by allowing themselves to use the SiteFinder service. They don't allow other registrars to do so, therefore it is special consideration.

Re:How is this even possible? on Hotel Being Sued for Using the Dewey Decimal System · 2003-09-21 04:04 · Score: 1

According to this page, Melvil Dewey (1851-1931) anonymously published the system in 1876.

You'd think that it would be in the public domain by now, wouldn't you?

Re:Search on msdn.microsoft.com on Microsoft Works on Search Capabilities · 2003-09-19 06:38 · Score: 2, Interesting

Microsoft. Search experts.

What's done in the lab and what can actually be sold are very different things. The senior information retrieval researchers at MSR are *smart* people.

I had the opportunity to hear Susan Dumais' talk on "Stuff I've Seen" at SIGIR this year. SIS is a really interesting piece of software, a personal search engine. Every e-mail you send or receive, every file you create is fed into a search engine residing on your PC. You can then search for things by date, keyword, etc. and easily locate exactly what you're looking for.

Yeah, great search interface! Really inspires my confidence!

If anyone can topple Google, they can.

Re:Not so fast... on Canada Immune From RIAA? · 2003-09-16 04:24 · Score: 1

Forgot my citations. Sorry.

[1] Jay Currie. Blame Canada. http://techcentralstation.com/081803C.html

[2] Copyright Board of Canada. Fact Sheet: Private Copying 1999-2000 Decision. http://www.cb-cda.gc.ca/news/c19992000fs-e.html

Not so fast... on Canada Immune From RIAA? · 2003-09-16 03:54 · Score: 5, Insightful

To quote Jay Currie (emphasis mine):

The amendment to the Act legalized copying of sound recordings of musical works onto audio recording media for the private use of the person who makes the copy (referred to as "private copying"). [1]

Audio recording media is defined as "Analog Audio Casette Tapes," "MiniDisc, CD-R Audio and CD-RW Audio" and "CD-R and CD-RW." [2] This does not include hard drives (I recall discussion of extending the levy to hard drives), so therefore your hard drive is not "audio recording media" and thus the Act does not legalize file sharing.

This being said, it would be harder to argue if you immediately burned the downloaded songs to an audio CD, promptly deleting the copy on your hard drive.

Re:tagging bills together on Microsoft Money Leads To Street-Legal Porsche 959s · 2003-09-16 03:01 · Score: 1

Bills are bundled together and 100-dollar bills are bundled together.

Interesting...

What gives Versign the right? on Resolving Everything: VeriSign Adds Wildcards · 2003-09-15 14:39 · Score: 1

What gives Verisign the right to unilaterally make this decision about how the internet will work? As it's been mentioned, it breaks a lot of stuff and from what I've heard (admittedly, I haven't paid a lot of attention), nobody except them seems to want it.

A network with no single point of failure? Pah!

No more bullshitting. on Adrian Lamo Charged With Hacking · 2003-09-06 03:16 · Score: 1

If NYT wanted a security audit of their system, they would have paid someone to do it. Since they did not, they obviously didn't want one. Good intentions or not, Lamo broke the law and deserves to face the consequences of his actions.

I realize that it's "chic to be geek" here with the whole "white hat" hacking stuff, but be realistic. After all, you don't see people doing the physical analogue of white hat hacking. That's B&E.

Interesting article but unsound methodology on Seven Spam Filters Compared · 2003-08-23 09:59 · Score: 2, Insightful

Sam's article was a very interesting read, but his results need to be taken with a grain of salt.

To show that one piece of software outperforms another, you need to prove statistical significance. This can be done in two ways:

The first method is called the pairwise t-test. What you need to do is to run k tests using different training and test data. For each of these tests, you find the accuracy of the classifier (#success/#trials). The, you form the "t-statistic," t = d/sqrt(sigma_d^2 / k), where d is the difference of the means of the two classifiers, sigma_d^2 is the variance of the difference samples and k is the number of samples. Then, you compare your t-statistic to the Student's distribution with k-1 degrees of freedom. Typically, you want a confidence level of 90% or 95% so you find the number of standard deviations away from the mean for the specific t-test (e.g. the 90% statistic 9-degree of freedom t-test is 1.38). If your t-statistic is greater than the number of standard deviations, then the difference between the two classifiers is statistically significant with X% confidence. Read more about this in Witten and Frank's Data Mining book.

The other method is called Analysis of Variance (ANOVA). I'm not familiar enough with this method to explain it here, but it allows you to choose from a set of experiments which ones really are above the average. Dig around in your statistics books or on the web for more information.

Sam should have made use of either of these techniques when doing his analysis. Since he only ran one experiment per configuration of his classifier, you can draw no real conclusions from the data presented (it's a Student's distribution with 0-degree of freedom... essentially flat!).

Since most of us only have a small number of corpora kicking around (maybe even only one!), you can use a method called "cross validation" to give yourself a larger number of data sets than you actually have. When doing a cross validation, you divide your corpus up into k "folds" and then perform k experiments. In each experiment, you set aside one fold of your data for testing and train on the other k-1 folds. Since you're using different test data each time, each experiment can be considered to be different and then you can use a pairwise t-test to prove statistical significance. There are other methods that you can use such as "leave one out" where you have as many folds as you do pieces of training data and "bootstrapping" where you sample your training data with replacement and test with whatever wasn't sampled for training.

However, cross validation may not be appropriate for incremental learning algorithms if your data is on a timeline (such as e-mail). You can break your corpus up into pieces and do your evaluation on that.

Proving statistical significance is very easy and allows you to be confident in the conclusions that you make in your publications. It's the scientific method!

Good luck!

Henry

The trick that got me where I am today... on How Do You Get Work Done? · 2003-07-27 08:36 · Score: 1

The trick that helped me get my undergraduate thesis done was simply to unplug myself. I'd turn off my cell phone, pick up my laptop, leave the NIC at home and go out and find a quiet place to work with no distractions.

If you simply can't unplug yourself, turn off your IRC client, instant messaging services and e-mail client. With a little self discipline (no reading /.!), you'll be surprised what you can get accomplished in a day.

I hope you get some useful tips out of people today. Good luck with your studies!

Henry
(Now a post-bachelor PhD. student, thanks to this technique.)

Re:Lies! on When Good Spammers Go Bad · 2003-07-21 10:34 · Score: 1

So when good spammers go bad, we have zombie spammers?

Where's my lawnmower? It's choppin' time.

SVG for data visualization on Mozilla Gets (Beta) Native SVG support · 2003-07-19 23:48 · Score: 2, Interesting

This might be a bit off topic, but I want to use SVG for data visualization and have been having trouble finding suitable software.

The SVG implementations I've found so far either have no external user interface with nice things like scrollbars (Adobe/Corel) or can't handle my very large graphics (everything else I've seen).

I've been very disappointed about this lack of good viewers. SVG is well-suited for data visualization and could become a "killer app" with the right software support.

Re:Wow, all this.. on The Star Wars Alphabet Project · 2003-07-18 00:15 · Score: 0

Whatever you do, please don't post pictures of that on Slashdot. :)

Enough... on Darl McBride Interview · 2003-06-30 01:49 · Score: 1

I'm not going to dignify this article or any future articles by reading or commenting on it. I suggest that you all stop feeding the SCO troll and do the same.

You'd think that slashdotters would know a troll when they see one. :) Does he need to redirect www.sco.com to www.goatse.cx to show his true colours?

Slashdot Mirror

User: Henry+Stern

Comments · 140